Evaluating Parameters in Kettle Jobs

August 24, 2006
Submitted by Matt Castors, Chief of Data Integration, Pentaho

Dear Kettle friends,
This weekly tip describes a solution to the following problem:

Each day you have a text file that arrives.
You want to load it into a database, but only if the file contains more than 5 records.
If we don't have at least 5 rows we know that something is wrong: The transformation to load the file also truncates the database table and if the file contains too few records, it just means that the wrong data is being loaded for the users to see.

Here is how we do this:

As you can see, the logic we implement is exactly the logic we wanted, nothing special is going on.

This is an image of the transformation that calculates the parameter:

The "Group by" step simply counts the number of rows. Note that if you want to do this from a database source, you can simply launch a SQL statement like this to get the answer right away: SELECT count(*)
FROM SomeTable

Now that we have a result row with a single value, how do we obtain this value in the job? Well, we can use the "Evaluation" job entry.
The code is as follows:
var nrRows = rows[0].getInteger("nr", -1)

nrRows > 100

As you can see we provide you with an array containing all the rows that are in the Result object.

So basically, the evaluation is two simple steps, the rest of the job is added to be somewhat complete (wink)

You can find the job, transformation and test file here. To use this example, unzip it in a directory and point the environment variable "deploy-dir" to this directory.
The easiest way to set the variable is to add the following line in your Kettle environment variables files (file: $HOME/.kettle/kettle.properties or C:\Documents and Settings\\.kettle\kettle.properties)deploy-dir = /some/directory or deploy-dir = C:\some\directory\

Until next time,
Matt

P.S. Someone posted a similar method of doing this by using environment variables. Here is a sample job using local variables and the evaluation job entry.