Kettle 4 and the art of internationalization

Introduction

Internationalization (i18n for short) in Kettle is handled by a graphical user interface called Translator, org.pentaho.di.ui.i18n.editor.Translator2 to be precise.  A script is provided for Windows (Translator.bat) and Linux/OSX/Unix systems (translator.sh).

This application will allow a translator to edit the translations for the keys that are used in the Kettle source code.

To run Translator you will need access to the source code and have a Java 1.5 development kit installed.  You also need ant to compile Kettle.  Once you ran ant and built the Kettle libraries, you can execute translator from the root of the Kettle source code folder.

Translator

The translator GUI allows you access to all the available locale that are available.  It will colour those packages that contain messages files that are lacking translations compared to en_US.  You can then simply enter the translations and update the corresponding messages files.  For example, updating the fr_FR (French - France) translations in package org.pentaho.di.core.plugins would lead to an update in file src-core/org/pentaho/di/core/plugins/messages/messages_fr_FR.properties.  The Translator application takes care of proper UTF-8 encoding of the properties files as well as figuring out the proper place to store the file.

By only offering to translate used i18n keys, a lot of time is saved since on long-lived active projects like Kettle there is usually a lot of code being changed with a lot of "dead" or unused keys as a result.  Not having to translate these keys leads to a lot of effort saved since each key is picked up by (currently) 11 locale.

How does it work?

Translator can detect keys that are not yet translated.  It does this by scanning both the Java and XUL source code as well as the messages bundles.  It then correlates both and determines missing keys.

The Java code is scanned for occurrences of the scan-phrases specified in the configuration file translator.xml.  For Kettle 4, the phrase that is scanned for is:

BaseMessages.getString(PKG,

Right after this phrase, the scanner expects to find the i18n key.
At the top of the Java file, the scanner expects to find the following type of line:

private static Class<?> PKG = Spoon.class;

With this, the Translator code scanner can figure out to which package this class belongs and consequently to which package the i18n keys belong.

Translator supports multiple source folders, system i18n keys, configurable locale and can be thought to avoid certain system files where scanning would lead to errors.  All this including configuration options to scan for i18n keys in XUL files can be specified in translator.xml.

When does it NOT work?

The translator scanning algorithm is currently line-based.  As such, if a single BaseMessages.getString() call is spread over multiple lines, the scanning will lead to incorrect results.  This is usually the case for careless code formatting with 80-character java line length limits (in your IDE).  Ever time you see a statement like the following, you know in advance that translator will have a problem with it:

BaseMessages.getString(PKG,

"Some.Key")

 Translator will in these cases print a line to the console during the scanning phase at startup like the following:

Suspect key found: [              BaseMessages.getString(PKG, ] in file [file:///home/matt/svn/kettle/trunk/src-ui/org/pentaho/di/ui/spoon/Spoon.java]

The solution then is to simply put the code back "in order".  Please note that this is NOT a Java problem, simply a Translator problem.

Translator will display a fatal error in case there are classes that refer to a PKG class where no messages folder or class for the reference locale (en_US) is present.  This is an unrecoverable i18n and should be fixed immediately either by creation of the messages bundle or (as is usually the case after refactoring) by pointing to the correct reference PKG class.