.03 Database Connections

Unknown macro: {scrollbar}

Database Connections

A database connection describes the method by which Kettle connects to a database. You can create connections specific to a Job or Transformation or store them in the Kettle repository for re-use in multiple transformations or jobs.

Note: Please see also Special database issues and experiences

The following topics are covered in this section:

Creating a New Database Connection


This section describes how to create a new database connection and includes a detailed description of each connection property available in the Connection information dialog box.

To create a new connection right click the Database Connections in the tree and select New or New Connection Wizard. You can also double click Database Connections, or press F3.

The Connection information dialog box appears. The topics that follow describe the configuration options available on each tab of the Connection information dialog box.

General

The general tab is where you set up the basic information about your connection such as the connection name, type, access method, server name and log on credentials. The table below provides a detailed description of the options available under the General tab:

Feature

Description

Connection Name

Uniquely identifies a connection across transformations and jobs

Connection Type

Type of database you are connecting to (for example, MySQL, Oracle, and so on)

Method of access

This will be either Native (JDBC), ODBC, or OCI. Available access types depend on the type of database you are connecting to

Server host name

Defines the host name of the server on which the database resides. You can also specify the host by IP-address

Database name

Identifies the database name you want to connect to. In case of ODBC, specify the DSN name here

Port number

Sets the TCP/IP port number

Username

Optionally specifies the user name to connect to the database

Password

Optionally specifies the password to connect to the database

Pooling

The pooling tab allows you to configure your connection to use connection pooling and define options related to connection pooling like the initial pool size, maximum pool size and connection pool parameters. The table below provides a more detailed description of the options available on the Pooling tab:

Feature

Description

Use a connection pool

Enables connection pooling

The initial pool size

Sets the initial size of the connection pool.

The maximum pool size.

Sets the maximum number of connections in the connection pool.

Parameter Table

Allows you to define additional custom pool parameters.

MySQL

By default, MySQL returns complete query results in one block to the client, (Kettle in this case), so "result streaming" is enabled. One drawback associated result streaming is that it allows only one single query to be opened at any given time. You can disable this option in the MySQL tab of the database connection dialog box if necessary.

Another issue you may encounter is that the default timeout in the MySQL JDBC driver is set to 0 (no timeout). In certain instances, this may not allow Kettle to detect a server crash or sudden network failure if it occurs in the middle of a query or open database connection. This in turn leads to the infinite stalling of a transformation or job. To solve problem, set the "connectTimeout" and "socketTimeout" parameters for MySQL in the Options tab. The value to be specified is in milliseconds: for a 2 minute timeout you would specify value 120000 ( 2 x 60 x 1000 ).

You can also review other options on the linked MySQL help page by clicking on the 'Show help text on option usage' button found on the Options tab.

Note: please see also MySQL

Oracle

This tab allows you to specify the default data and index tablespaces which Kettle uses when generating SQL for Oracle tables and indexes.

This version of Pentaho Data Integration ships with the Oracle JDBC driver version 10.2.0. It is the most stable and recent driver we could find; however, if you have issues with Oracle connectivity or other problems, you may consider replacing the 10.2. JDBC driver to match your database server. Replace files "ojdbc14.jar" and "orai18n.jar" in the directory libext/JDBC of your distribution with the files found in the $ORACLE_HOME/jdbc directory on your server.

If you are using OCI and an Oracle Net8 client, the JDBC driver version used in Kettle needs to match your Oracle client version.  PDI 2.5.0 shipped with version 10.1, 3.0.0 ships with version 10.2. You can either install that version of the Oracle client or change the JDBC driver in PDI if versions don't match up.

Note: please see also Oracle

Informix

For Informix, you must specify the Informix Server name in the Informix tab in order for a connection to be usable.

SQL Server

This tab allows you configure the following properties specific to Microsoft SQL Server:

Feature

Description

SQL Server instance name

Sets the instance name property for the SQL Server connection.

Use .. to separate schema and table

Enable when using dot notation to separate schema and table.

Other properties can be configured by adding connection parameters on the options tab of the Connection information dialog box. For example, you can enable single sign-on login by defining the domain option on the Options tab as shown below:

From the jTDS FAQ on http://jtds.sourceforge.net/faq.html:

Specifies the Windows domain to authenticate in. If present and the user name and password are provided, jTDS uses Windows (NTLM) authentication instead of the usual SQL Server authentication (i.e. the user and password provided are the domain user and password). This allows non-Windows clients to log in to servers which are only configured to accept Windows authentication.

If the domain parameter is present but no user name and password are provided, jTDS uses its native Single-Sign-On library and logs in with the logged Windows user's credentials"

Note: please see also MS SQL Server

SAP R/3

This tab allows you configure the following properties specific to SAP R/3:

Feature

Description

Language

Specifies the language to be used when connecting to SAP.

System Number

Specifies the system number of the SAP system to which you want to connect.

SAP Client

Specifies the three digit client number for the connection.

Note: For this to work, you need a plug-in, please see List of Available Pentaho Data Integration Plug-Ins

Generic

This tab is where you specify the URL and Driver class for Generic Database connections. You can also dynamically set these properties using Kettle variables. Using Kettle variables provides you with the ability to access data from multiple database types using the same transformations and jobs.

Note: Make sure to use clean ANSI SQL that works on all used database types in the latter case.

Options

This tab allows you to set database-specific options for the connection by adding parameters to the generated URL.

The use of variables is also possible within the Paramater / Value options.

Follow the instructions below to add a parameter:

  1. Select the next available row in the parameter table
  2. Choose your database type and enter a valid parameter name and its corresponding value
  3. For more database-specific configuration help, click the 'Show help text on option usage' button and a new browser tab appears in Spoon with additional information about the configuring the JDBC connection for the currently selected database type:

SQL

This tab allows you to enter a number of SQL commands immediately after connecting to the database. This is sometimes needed for reasons such as licensing, configuration, logging, tracing, and so on.

Cluster

This tab allows you to enable clustering for the database connection and create connections to the data partitions. To enable clustering for the connection, enable the 'Use Clustering?' option.

To create a new data partition, enter a partition ID and the hostname, port, database, username and password for connecting to the partition.

Advanced

This tab allows you configure the following properties for the connection:

Feature

Description

Quote all identifiers in database

Specifies the language to be used when connecting to SAP

Force all identifiers to lower case

Specifies the system number of the SAP system to which you want to connect

Force all identifiers to upper case

Specifies the three digit client number for the connection

Testing a Connection

The Test button in the Connection information dialog box allows you to test the current connection. An confirmation message displays if Spoon is able to establish a connection with the target database.

Explore

The Database Explorer allows you to browse the target database interactively, preview data, generate DDL and much more. To open the Database Explorer for an existing connection, click the 'Explore' button found on the Connection information dialog box or right-click on the connection in the Main tree and select 'Explore'. See the Database Explorer for more information.

Feature List

The feature list exposes the JDBC URL, class, and various database settings for the connection such as the list of reserved words.

Editing a Connection


To edit an existing connection, double-click on the connection name in the main tree or right-click on the connection name and select 'Edit connection.'

Duplicating a Connection


To duplicate an existing connection, right-click on the connection name and select Duplicate.

Copying to a Clipboard


This option allows you to copy the XML defining the step to the clipboard. You can then paste this step into another transformation.

Deleting a Connection


To delete an existing database connection, right-click on the connection name in the main tree and select Delete.

Executing SQL Commands on a Connection


To execute SQL command against an existing connection, right-click on the connection name and select SQL Editor. See the SQL Editor for more information.

Clearing the Database Cache Option


To speed up connections Spoon uses a database cache. When the information in the cache no longer represents the layout of the database, right-click on the connection in the Main tree and select the 'Clear DB Cache...' option. This command is commonly used when databases tables have been changed, created or deleted.

Quoting


Pentaho has implemented a database-specific quoting system that allows you to use any name or character acceptable to the supported databases' naming conventions.

Pentaho Data Integration contains a list of reserved words for most of the supported databases. To ensure that quoting behaves correctly, Pentaho has implemented a strict separation between the schema (user/owner) of a table and the tablename itself. Doing otherwise, makes it impossible to quote tables or fields with one or more periods in them correctly. Placing periods in table and field names is common practice in some ERP systems(for example, fields such as "V.A.T.")

To avoid quoting-related errors, Pentaho has added a new rule in version 2.5.0 that stops the Pentaho Data Integration from performing quoting activity when there is a start or end quote in the tablename or schema. This allows you to specify the quoting mechanism yourself.

Note: Contact Pentaho if you find other ways to improve Pentaho's quoting algorithms.

Database Usage Grid


The table below contains information that may help you configure your particular database.

Note: The list of supported databases grows continuesly in each release. Please see the list of Connection types in the database dialog.

Database

Access Method

Server Name or IP Address

Database Name

Port # (default)

User Name and Password

Oracle

Native

Required

Oracle database SID

Required (1521)

Required

 

ODBC

 

ODBC DSN name

 

Required

 

OCI

 

Database TNS name

 

Required

MySQL

Native

Required

MySQL database name

Optional (3306)

Optional

 

ODBC

 

ODBC DSN name

 

Optional

AS/400

Native

Required

AS/400 Library name

Optional

Required

 

ODBC

 

ODBC DSN name

 

Required

MS Access

ODBC

 

ODBC DSN name

 

Optional

MS SQL Server

Native

Required

Database name

Required (1433)

Required

 

ODBC

 

ODBC DSN name

 

Required

IBM DB2

Native

Required

Database name

Required (50000)

Required

 

ODBC

 

ODBC DSN name

 

Required

PostgreSQL

Native

Required

Database name

Required (5432)

Required

 

ODBC

 

ODBC DSN name

 

Required

Intersystems Caché

Native

Required

Database name

Required (1972)

Required

 

ODBC

 

ODBC DSN name

 

Required

Sybase

Native

Required

Database name

Required(5001)

Required

 

ODBC

 

ODBC DSN name

 

Required

Gupta SQL Base

Native

Required

Database Name

Required (2155)

Required

 

ODBC

 

ODBC DSN name

 

Required

Dbase III,IV or 5.0

ODBC

 

ODBC DSN name

 

Optional

Firebird SQL

Native

Required

Database name

Required (3050)

Required

 

ODBC

 

ODBC DSN name

 

Required

Hypersonic

Native

Required

Database name

Required (9001)

Required

MaxDB (SAP DB)

Native

Required

Database name

 

Required

 

ODBC

 

ODBC DSN name

 

Required

Ingres

Native

Required

Database name

 

Required

 

ODBC

 

ODBC DSN name

 

Required

Borland Interbase

Native

Required

Database name

Required (3050)

Required

 

ODBC

 

ODBC DSN name

 

Required

ExtenDB

Native

Required

Database name

Required (6453)

Required

 

ODBC

 

ODBC DSN name

 

Required

Teradata

Native

Required

Database name

 

Required

 

ODBC

 

ODBC DSN name

 

Required

Oracle RDB

Native

Required

Database name

 

Required

 

ODBC

 

ODBC DSN name

 

Required

H2

Native

Required

Database name

 

Required

 

ODBC

 

ODBC DSN name

 

Required

Netezza

Native

Required

Database name

Required (5480)

Required

 

ODBC

 

ODBC DSN name

 

Required

IBM Universe

Native

Required

Database name

 

Required

 

ODBC

 

ODBC DSN name

 

Required

SQLite

Native

Required

Database name

 

Required

 

ODBC

 

ODBC DSN name

 

Required

Apache Derby

Native

optional

Database name

Optional (1527)

Optional

 

ODBC

 

ODBC DSN name

 

Optional

Generic (star)

Native

Required

Database name

Required (Any)

Required

 

ODBC

 

ODBC DSN name

 

Optional

(star) The generic database connection also needs to specify the URL and Driver class in the Generic tab. Pentaho also allows these fields to be specified using a variable so you can access data from multiple database types using the same transformations and jobs. Make sure to use clean ANSI SQL that works on all used database types in that case.

Configuring JNDI Connections


If you are developing transformations and jobs that will be deployed on an application server such as the Pentaho platform running on JBoss, you can configure your database connections using JNDI.

Because you don't want to have an application server running all the time during development or testing of the transformations, Pentaho has supplied a way of configuring a JNDI connection for "local" Kettle use.

To configure, edit the properties file called "simple-jndi/jdbc.properties" For example, to connect to the databases used in Pentaho Demo platform download and use this information in the properties file:

SampleData/type=javax.sql.DataSource
SampleData/driver=org.hsqldb.jdbcDriver
SampleData/url=jdbc:hsqldb:hsql://localhost/sampledata
SampleData/user=pentaho_user
SampleData/password=password
Quartz/type=javax.sql.DataSource
Quartz/driver=org.hsqldb.jdbcDriver
Quartz/url=jdbc:hsqldb:hsql://localhost/quartz
Quartz/user=pentaho_user
Quartz/password=password
Hibernate/type=javax.sql.DataSource
Hibernate/driver=org.hsqldb.jdbcDriver
Hibernate/url=jdbc:hsqldb:hsql://localhost/hibernate
Hibernate/user=hibuser
Hibernate/password=password
Shark/type=javax.sql.DataSource
Shark/driver=org.hsqldb.jdbcDriver
Shark/url=jdbc:hsqldb:hsql://localhost/shark
Shark/user=sa
Shark/password=

Note: It is important that the information stored in this file in the simple-jndi directory mirrors the content of your application server data sources.

Unsupported Databases


Contact Pentaho if you want to access a database type that is not yet supported. A few database types are not supported in this release due to the lack of a sample database and/or software.

It is generally possible to read from unsupported databases by using the Generic database driver through an ODBC or JDBC connection.