PDI Partition Method Plugin Development

Creating a partition method plugin is as easy as creating 2 classes and then putting them in a jar file.  Here are the 2 classes you need:

  1. A class that implements the Partitioner interface.  You can extend BasePartitioner for your convenience.
  2. A class that allows the user to configure the partitioner options in a dialog.

Partitioner

The Partitioner interface is fairly simple.  Beyond a few administrative methods (see the example below) there is only the getPartition() method that's really important: getPartition().  This method too is simple in the sense that it gets a row of data as input and you have to give back an integer x where x>=0 and x<nrPartitions.  nrPartitions is initialized using the init() method.

public int getPartition(RowMetaInterface rowMeta, Object[] row) throws KettleException {
		init(rowMeta);

		if (partitionColumnIndex < 0) {
			partitionColumnIndex = rowMeta.indexOfValue(fieldName);
			if (partitionColumnIndex < 0) {
				throw new KettleStepException(BaseMessages.getString(PKG, "HourPartitioner.Exception.PartitioningFieldNotFound", fieldName, rowMeta.toString()));
			}
		}

		ValueMetaInterface valueMeta = rowMeta.getValueMeta(partitionColumnIndex);
		Object valueData = row[partitionColumnIndex];

		if (!valueMeta.isString()) {
			throw new KettleException(BaseMessages.getString(PKG, "HourPartitioner.Exception.NotAFilename", valueMeta.getName()));
		}

		String filename = valueMeta.getString(valueData);
		String hourString = filename.substring(filename.length()-6, filename.length()-4);
		int value = Integer.parseInt(hourString);
		int targetLocation = (int) (value % nrPartitions);

		return targetLocation;
	}

For more information, see the sample below.

The dialog class

The dialog class only need two methods to be declared:

For more information on how to program a dialog, see elsewhere in the PDI SDK pages or on the Internet.  Look for Eclipse platform SWT code snippets.

Deployment

You should annotate your partitioner class with the @PartitionerPlugin annotation to signal to the Kettle plugin registry that this plugin needs to be loaded at startup.

Then compile the 2 classes and put them in a jar file.  Place that jar file in the plugins/steps (steps is not a typo!) folder or sub-folder

If you need additional libraries to be included in the class path of the plugin you can place them in a lib sub-folder next to the plugin jar file.

An example

 See the source code of the Hour partitioner plugin:

The Partitioner class: HourPartitioner.java

The Dialog class: HourPartitionerDialog.java

The hour partitioner takes the name of a file to partition on.  The name of the file contains the hour on which the data was captured and we want to use this to partition on.  For example: Weblogs-20100329-23.txt  This partitioner takes the 23 from the filename, turns it into an integer and calculates the remainder of the division by the number of partitions in the partitioning schema.