What's new in PDI version 3.1

Index


Introduction


In the period that version 3.1 came about, we had 5 other releases: 2.5.2, 3.0.1, 3.0.2, 3.0.3 and 3.0.4.  All the same, we managed to get quite a bit of work done.

The first theme for this release was "Ease of use".  It's a theme shared with the rest of the Pentaho platform and tool set.  Traditionally, Kettle isn't the worst player in that department, but you can always do better.

The second theme of this release was the complete rework of the documentation set.  To keep things manageable by larger groups of people we moved everything we could to the central Pentaho wiki.
Documenting is a difficult task that can never be considered complete but the wiki will help us to keep up with the incredible pace of development that we again achieved in Kettle.

Ease of use

Execution results

To do away with the tab-clutter that came about in the previous release we decided to put the results of executions in a split pane below the graphical view:



 

Performance graph

To make it easier to see which step is performing well or not, we periodically (configurable) gather performance statistics and we can show those on a graph:



 
We also allow you to store the raw data behind the graph in a database table so that you can create your own statistics.

FAQ attack

We're constantly on the look out to reduce the size of our FAQ, not increase it.  We do this by informing the users of consequences of certain decisions or giving answers to FAQ in the Spoon GUI.

Some of these FAQ attack measures are subtle, like the fact you can now execute a stored procedure without the need for input to go to the step. (it simply executes once).

Others are less subtle, like the tool-tip we show after you dragged the second step onto the canvas:


 

New database dialog

The old database dialog was sometimes a bit confusing.  It became one of the most complete database connection configuration tool, but usability and clarity suffered because of this.
At the same time we had the need for a shared database dialog to be used by different tools in the Pentaho stack.  Because of this, we opted to create the dialog in the Mozilla backed XUL standard.
An SWT layer was created and the new dialog is now much easier on the eyes and much easier to use:

 
As you can see, only those options that are relevant to the selected database and access type are shown.

Zoom

If you are dealing with large transformations or jobs, it could be useful to zoom in and out of it to keep an overview:

 

Snap to grid


Some people love it, some people hate it, but here it is, the long awaited "snap-to-grid" functionality :

Welcome page / Getting started

We created a "Getting Started" page and linked it on the welcome page.  We also linked a number of extra blogs (smile)

Changes in steps

 

INPUT

OUTPUT

LOOKUP

TRANSFORM

  • Add a checksum
    • Calculate a checksum over one or more fields
  • Calculator
    • Various new calculation types
  • Clone row
    • Create one or more copies of the passing rows
  • Data validator
    • Extensive tool to validate your data
  • Delay row
    • Delay for a certain period before passing each row
  • JavaScript
    • Support for EMCA v4
    • Additional new functions for file handling and much more
  • Group By
    • Support for cumulative sum and average, stddev, concatenation with specific separator

SCRIPTING

  • Regex Evaluation
    • Validate strings using regular expressions
    • Grab capture groups and turn them into fields

Joins

  • XML Join
    • The XML Join Step allows to add xml tags from one stream into a leading XML structure from
      a second stream.
    • Allows you to create complex XML strings

Bulk Loading

Experimental

  • Get sub folder names
  • Mail
  • Mail validator
  • MonetDB bulk loader
  • Greenplum bulk loader
  •  
  • PostgreSQL bulk loader

Job entry changes


The first thing you'll notice is that the job entries are now also split into different categories.

Many job entries have been added in this release and a number got changes too...

File management

Conditions

Scripting

  • Shell
    • you can now specify the script to execute in the dialog

File transfer

  • SSH2 Get
  • SSH2 Put

Repository

  • Check if connected to repository
  • Export repository to XML file

Databases


Besides the new database dialog (see above) we also added support for a few new database types.  We now have support for 34 database types and a generic database connection for the others.

Here are the new ones...

  • MonetDB : the Dutch open source column database
  • KingbaseES : the popular Chinese RDBMS (PostgreSQL based)
  • Vertica : The upcoming high performance column database
  • HP NeoView : HP's answer to operational BI

Internationalization


In the i18n department, all teams made great strides but we would like to especially thank the Korean (Kim YoungWoo) and Japanese (Hiroyuki Kawaguch) translators for an excellent job.

Here is an overview of the translation status:

Language

% Complete

Keys done (shown in the language)

Keys missing (shown in English)

en_US

100,00%

9442

0

it_IT

100,00%

9442

0

fr_FR

100,00%

9442

0

es_AR

64

6069

3373

ko_KR

61

5740

3702

ja_JP

57

5341

4101

zh_CN

53

5021

4421

de_DE

48

4539

4903

es_ES

41

3853

5589

nl_NL

15

1432

8010

pt_BR

13

1237

8205

pt_PT

13

1236

8206

Also many kudos to the Italian (The great Nico Ben)  and French (Super Samatar Hassan) translators for keeping up there at 100%. Given the ever so fast development pace, this is no small feat!!

Community and codebase

A word of thanks

As in any good open source project, our community was the driving force behind this excellent release.  Pentaho obviously spent a large amount of time on this release but it wouldn't have been the same without the valuable help of all our developers, testers, bug reporters, partners, customers, documenters, translators, forum members, etc.  It would lead us too far to thank everyone but it's all of you that keep Kettle going!

Even though all contributions are valued a lot, I would like to give special thanks to Samatar Hassan, Daniel Einspanjer (at Mozilla) and Ingo Klose (at SHS-Viveon) for their contributions to this release.

On the Pentaho team I would like to applaud Jem for porting that pesky Spoon users guide over to the Wiki.  Many thanks to the whole team for all the help!

Codebase

Even though we try our best to re-factor and simplify the codebase all the time, there is no denying that the codebase keeps growing.
Right before every release we run the following command:

find . -name "*.java" -exec wc -l {} \; | awk '{ sum+=$1 } END { print sum }'

This is what that gave us over the last releases:

Version

Lines of code

Increase

2.1.4

160,000

 

2.2.2

177,450

  17,450

2.3.0

213,489

  36,039

2.4.0

256,030

  42,541

2.5.0

292,241

  36,211

3.0.0

348,575

  56,334

3.1.0

456,772

108,197

As you can see, there is no sign of any slowdown in the development of the Kettle codebase.  Looking at the roadmap this is bound to stay like that for the foreseeable future.

 


Matt Casters - Okegem/Belgium - September 18th 2008