PDI Partition Method Plugin Development

PDI Partition Method Plugin Development

Creating a partition method plugin is as easy as creating 2 classes and then putting them in a jar file.  Here are the 2 classes you need:

  1. A class that implements the Partitioner interface.  You can extend BasePartitioner for your convenience.
  2. A class that allows the user to configure the partitioner options in a dialog.

Partitioner

The Partitioner interface is fairly simple.  Beyond a few administrative methods (see the example below) there is only the getPartition() method that's really important: getPartition().  This method too is simple in the sense that it gets a row of data as input and you have to give back an integer x where x>=0 and x<nrPartitions.  nrPartitions is initialized using the init() method.

public int getPartition(RowMetaInterface rowMeta, Object[] row) throws KettleException {
		init(rowMeta);

		if (partitionColumnIndex < 0) {
			partitionColumnIndex = rowMeta.indexOfValue(fieldName);
			if (partitionColumnIndex < 0) {
				throw new KettleStepException(BaseMessages.getString(PKG, "HourPartitioner.Exception.PartitioningFieldNotFound", fieldName, rowMeta.toString()));
			}
		}

		ValueMetaInterface valueMeta = rowMeta.getValueMeta(partitionColumnIndex);
		Object valueData = row[partitionColumnIndex];

		if (!valueMeta.isString()) {
			throw new KettleException(BaseMessages.getString(PKG, "HourPartitioner.Exception.NotAFilename", valueMeta.getName()));
		}

		String filename = valueMeta.getString(valueData);
		String hourString = filename.substring(filename.length()-6, filename.length()-4);
		int value = Integer.parseInt(hourString);
		int targetLocation = (int) (value % nrPartitions);

		return targetLocation;
	}

For more information, see the sample below.

The dialog class

The dialog class only need two methods to be declared:

  • open() This method is called to open the dialog shell and show the dialog to the user
  • setRepository() This method is called by Kettle to pass the repository to the dialog so that additional repository objects can be referenced from within the dialog. (database connections, partitioning schema and so on)

For more information on how to program a dialog, see elsewhere in the PDI SDK pages or on the Internet.  Look for Eclipse platform SWT code snippets.

Deployment

You should annotate your partitioner class with the @PartitionerPlugin annotation to signal to the Kettle plugin registry that this plugin needs to be loaded at startup.

Then compile the 2 classes and put them in a jar file.  Place that jar file in the plugins/steps (steps is not a typo!) folder or sub-folder

If you need additional libraries to be included in the class path of the plugin you can place them in a lib sub-folder next to the plugin jar file.

An example

 See the source code of the Hour partitioner plugin:

The Partitioner class: HourPartitioner.java

The Dialog class: HourPartitionerDialog.java

The hour partitioner takes the name of a file to partition on.  The name of the file contains the hour on which the data was captured and we want to use this to partition on.  For example: Weblogs-20100329-23.txt  This partitioner takes the 23 from the filename, turns it into an integer and calculates the remainder of the division by the number of partitions in the partitioning schema.