MySQL

Performance considerations

Speed can be boosted by using some simple JDBC-connection settings for INSERTS (published in the following blog entry Increase MySQL output to 80K rows/second in Pentaho Data Integration by Julien Hofstede):

useServerPrepStmts=false
rewriteBatchedStatements=true
useCompression=true

These options can be entered in PDI at the connection. Double click the connection, go to Options and set these values.

Other options exists to improve read performance, see the MySQL manual for further details.

Communications link failure due to underlying exception

For long running queries on MySQL, you might get this exception:

Communications link failure due to underlying exception:

** BEGIN NESTED EXCEPTION **

java.io.EOFException

STACKTRACE:

java.io.EOFException
at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:1934)
etc.

The MySQL server parameter "net_write_timeout" defaults to 60 seconds.

I found that by increasing this to a larger value, the exception goes away.
Then we do not have to turn off the result set cursor and we don't need
to read the entire result set into memory by turning off result set streaming (cursor emulation)

This parameter is set in my.ini. I set it to 1800, but probably a
smaller value would suffice.

Please also see the chapter with the nice title "MySQL server has gone away" over here:
http://dev.mysql.com/doc/refman/5.1/en/gone-away.html

Another reported solution was: Set the tcpKeepAlive setting in the MySQL connecto (since MySQL Connector/J 5.0.7, please see about JDBC driver versions).

JDBC driver versions

We found several times that v5.x of the MySQL JDBC driver doesn't play along nicely with Kettle.

vanayka found out the same in this thread:  http://forums.pentaho.org/showthread.php?p=184717

Using an alias

There was a change in the usage of getColumnLabel and getColumnName in the JDBC drivers version 4.0 and later.

This has e.g. impact for SQL queries with an alias for the fieldname like:

SELECT column AS aliasName from table

With JDBC drivers version 4.0 and later you will eventually not get the aliasName.

PDI-2905 is adressing this issue, see also http://bugs.mysql.com/bug.php?id=35610
"In Connector/J 5.1 the behaviour was changed, so that instead of allowing only index
number or column label, only index number or column name are accepted."

This bug was fixed by MySQL and lead to a changed JDBC behaviour in this regard.

Character encoding issues

Make sure that you set the correct JDBC options like for example in the case of a Unicode target database:

useUnicode = true
characterEncoding = utf8
characterSetResults = utf8

 Even if you don't have a target Unicode database, it makes sense to set the encodings explicitly in those situations where source and target encodings are different etc.

Exception: Lock wait timeout exceeded; try restarting transaction

This is a MySQL issue and we found the following solution for this:

Change the value in my.ini to a higher value, e.g. "innodb_lock_wait_timeout=5".  To find out what your current lock wait timeout is, run "show variables like 'innodb_lock_wait%';"

If you are running in a MySQL cluster:

The innodb_lock_wait_timout variable has no bearing on the NDB tables.

This timeout is administered within the NDBD nodes. You need to
update the config.ini file with a higher value for
TransactionDeadlockDetectionTimeout.

http://dev.mysql.com/doc/mysql-cluster-excerpt/5.1/en/mysql-cluster-ndbd-definition.html#mysql-cluster-param-ndbd-definition-transactiondeadlockdetectiontimeout

Issues with MySQL tinyints

We got a report of a tinyint issue on PDI-8861. Unfortunately, we did not get a confirmation if this was solved but we keep it for the records here:

We proposed to set the transformedBitIsBoolean or tinyInt1isBit options.

Getting the meta data of a table with a large amount of rows could be slow

As reported in PDI-9457 in the table input step using the "Get SQL select statement" and "Do you want to include the field names in the SQL?" slows down with large amounts of rows.

This could happen also in other scenarios, when the meta data is retrieved the first time.

After researching this issue, we found, this is a MySQL bug reported over here: http://bugs.mysql.com/bug.php?id=64621
Proposed workaround:
set connection property useServerPrepStmts = true
see http://dev.mysql.com/doc/refman/5.5/en/connector-j-reference-configuration-properties.html
Further information from the MySQL case: "As for drawbacks of using this option, there are none. As a matter of fact, this is the preferred way. We discouraged it's use long ago while the code is server was still flaky. Now that it's stable you should definitely use SS PS for your work."

This looks resolved in JDBC driver version 5.1.19.

Additional info: This is only an issue for the first query - subsequent queries of the meta data will be pulled from the DB cache.