Creating a partition method plugin is as easy as creating 2 classes and then putting them in a jar file. Here are the 2 classes you need:
The Partitioner interface is fairly simple. Beyond a few administrative methods (see the example below) there is only the getPartition() method that's really important: getPartition(). This method too is simple in the sense that it gets a row of data as input and you have to give back an integer x where x>=0 and x<nrPartitions. nrPartitions is initialized using the init() method.
public int getPartition(RowMetaInterface rowMeta, Object[] row) throws KettleException { init(rowMeta); if (partitionColumnIndex < 0) { partitionColumnIndex = rowMeta.indexOfValue(fieldName); if (partitionColumnIndex < 0) { throw new KettleStepException(BaseMessages.getString(PKG, "HourPartitioner.Exception.PartitioningFieldNotFound", fieldName, rowMeta.toString())); } } ValueMetaInterface valueMeta = rowMeta.getValueMeta(partitionColumnIndex); Object valueData = row[partitionColumnIndex]; if (!valueMeta.isString()) { throw new KettleException(BaseMessages.getString(PKG, "HourPartitioner.Exception.NotAFilename", valueMeta.getName())); } String filename = valueMeta.getString(valueData); String hourString = filename.substring(filename.length()-6, filename.length()-4); int value = Integer.parseInt(hourString); int targetLocation = (int) (value % nrPartitions); return targetLocation; } |
For more information, see the sample below.
The dialog class only need two methods to be declared:
For more information on how to program a dialog, see elsewhere in the PDI SDK pages or on the Internet. Look for Eclipse platform SWT code snippets.
You should annotate your partitioner class with the @PartitionerPlugin annotation to signal to the Kettle plugin registry that this plugin needs to be loaded at startup.
Then compile the 2 classes and put them in a jar file. Place that jar file in the plugins/steps (steps is not a typo!) folder or sub-folder
If you need additional libraries to be included in the class path of the plugin you can place them in a lib sub-folder next to the plugin jar file.
See the source code of the Hour partitioner plugin:
The Partitioner class: HourPartitioner.java
The Dialog class: HourPartitionerDialog.java
The hour partitioner takes the name of a file to partition on. The name of the file contains the hour on which the data was captured and we want to use this to partition on. For example: Weblogs-20100329-23.txt This partitioner takes the 23 from the filename, turns it into an integer and calculates the remainder of the division by the number of partitions in the partitioning schema.