Converting your PDI v3 plugins to v4

Introduction

One of the main changes in version 4 of Pentaho Data Integration is the introduction of alternate repositorytypes.  Because of this major change, the API of PDI changed a little bit with respect to your plugins.  This document describes how you can quickly modify your plugins to work on PDI v4.  Modifying a step or job entry plugin shouldn't take more than 5 minutes.

Migration tips

New libraries

Because Pentaho Data Integration keeps evolving and an API change was required, you need to use the PDI v4 libraries to compile your plugins.  You can use the libs (kettle-*.jar) found in the lib folder of the binary distributions of PDI 4.x or you can compile yourself.  

You can look into the manifest found in the jar files to verify the version.  For example, do this:

unzip -c kettle-core.jar META-INF/MANIFEST.MF

This will display the manifest of the jar file, including the version and build date.

TO DO:

  • Make sure you are using PDI v4 libraries to compile your plug-ins

Introducing ObjectId

Because we can now have different repository types beyond a relational database, the identifyer of an Object like a transformation can now be something different from a simple long integer.  It can be a filename, a UUID, etc as well.   Because of this, we created the ObjectId() interface that basically is a wrapper around getId().

Job entry plug-ins

The JobEntryInterface changed in these methods:   

public ObjectId getID();
public void     setID(ObjectId id);
public void    loadRep(Repository rep, ObjectId id_jobentry, List<DatabaseMeta> databases,List<SlaveServer> slaveServers) throws KettleException;
public void    saveRep(Repository rep, ObjectId id_job) throws KettleException;

As you can see, all long references simply got replaced with the ObjectId interface.

TO DO:

  • Change the long references to ObjectId


Step plug-ins

The StepMetaInterface changed in these methods:   

public void saveRep(Repository rep, ObjectId id_transformation, ObjectId id_step) throws KettleException;
public void readRep(Repository rep, ObjectId id_step, List<DatabaseMeta> databases, Map<String, Counter> counters) throws KettleException;

As you can see, all long references simply got replaced with the ObjectId interface.

TO DO:

  • Change the long references to ObjectId

Repository method changes

We tried to keep the methods in the Repository interface the same as the old Repository class.

However, there have been a few small changes that influence the step and job entry plugins

loadDatabaseMetaFrom[Step|JobEntry]Attribute()


From:

public DatabaseMeta loadDatabaseMetaFromStepAttribute(

ObjectId id_step,
String code
) throws KettleException;

and
public DatabaseMeta loadDatabaseMetaFromJobEntryAttribute(

ObjectId id_jobentry,
String code
) throws KettleException;

To:

public DatabaseMeta loadDatabaseMetaFromStepAttribute(

ObjectId id_step,
String code,
List<DatabaseMeta> databases
) throws KettleException;

and
public DatabaseMeta loadDatabaseMetaFromJobEntryAttribute(

ObjectId id_jobentry,
String code,
List<DatabaseMeta> databases
) throws KettleException;

It was rather incorrect (bug) not to pass along the list of databases references in stead of loading a new object from the repository.  Especially when you rename a database this has negative consequences.

i18n changes

With respect to internationalization you can replace all instances of

Messages.getString("SomeClass.SomeKey", parameters)

with 

BaseMessages(PKG, "SomeClass.SomeKey", parameters);

At the very top of the class you can then define PKG like this:

private static Class<?> PKG = SomeClass.class; // for i18n purposes, needed by Translator2!!

This removed the need in v2 and v3 to have hundreds of almost identical Messages classes.
private static Class<?> PKG = Trans.class; // for i18n purposes, needed by Translator2!!   $NON-NLS-1$

Logging changes

Since v4 Kettle uses a central log store and a logging registry with which you can interact.

Here are some of the advantages:

  • Log separation between objects (transformations, jobs, …) on the same server
  • Central logging store infrastructure with central memory management
  • Logging data lineage so we know where each log row comes from
  • Incremental log updates from the central log store
  • Error line identification to allow for color coding

Have a look at PDI Logging in the SDK  and the API for more information.