Step Metadata Injection API

Introduction

Since version 4.1 of PDI there is a step called ETL Metadata Injection that allows an ETL developer to pass ETL metadata to a step.  Traditionally this metadata is only entered through a step dialog but a new API was introduced that allows the steps metadata to be programmable.

API

As with many other step plugin related things, it all starts in StepMetaInterface.  A new method was introduced there that says:

 /**
  * @return Optional interface that allows an external program to inject step metadata in a standardized fasion.
  * This method will return null if the interface is not available for this step.
  */
 public StepMetaInjectionInterface getStepMetaInjectionInterface();

StepMetaInjectionInterface itself is fairly simple with only 2 methods in it.

Retrieving available metadata injection entries

The getStepInjectionMetadataEntries() method returns a tree containing not only the top level attributes of the step but optionally complete 

  /**
   * @return A list of step injection metadata entries.
   *   In case the data type of the entry is NONE (0) you will get at least one entry in the details section.
   */
  public List<StepInjectionMetaEntry> getStepInjectionMetadataEntries() throws KettleException;

This method lists as such which attributes of the step are "programmable" using the ETL metadata injection step (or with other API usage).

Each entry is of a certain ValueMetaInterface data type (String, Number, Date, ...) with the exception of lists where the top level entry is of type 'None' (0). For example, take a look at the "Sort Rows" step.  You will notice that the step accepts a series of fields to sort on as well as the sort direction and so on. The way that this metadata grid is described is simply by having 2 levels of entries in the list returned by getStepInjectionMetadataEntries().

  • FIELDS (there is a list of fields to sort on that you can specify, type None)
    • FIELD (this is one field definition, type None)
      • the name of the field to sort on (String)
      • Sort ascending? (Y/N String)
      • Ignore case? (Y/N String)

This will explain to the outside world that you can pass not just one metadata item but a series of rows, each with 3 metadata items in it.
The values in the StepInjectionMetaEntry elements are not filled in, the goal of the getStepInjectionMetadataEntries() method is to describe the metadata injection capabilities, NOT to retrieve the metadata in a structured form.  Perhaps in a later stage we can also tackle that problem in a similar way.

Injecting metadata entries

The injectStepMetadataEntries() will accept a tree of StepInjectionMetaEntry elements as described in the getStepInjectionMetadataEntries section:

  /**
   * Inject a list of step injection metadata entries into the owned step metadata object.
   *
   * @param metadata The metadata to inject.
   *
   * @throws KettleException
   */
  public void injectStepMetadataEntries(List<StepInjectionMetaEntry> metadata) throws KettleException;

The main difference of-course is that instead of passing one FIELD element in the FIELDS top element (see the example above), you would be passing a series.  Each StepInjectionMetaEntry element will need to contain the actual metadata values.

Example

In the injection interface for the Sort Rows step we can see it's fairly easy to make step support metadata injection:

StepMetaInterface

   public StepMetaInjectionInterface getStepMetaInjectionInterface() {
    return new SortRowsMetaInjection(this);
  }

SortRowsMetaInjection

You can browse the source code over here.