What's new in PDI 4.0

Index

Introduction


PDI 4.0 is a nicely balanced release, a rare mix of a lot of new features combined with engine stability and 100% backward compatibility of your existing jobs and transformations.

Once again, many many thanks go to our large community of Kettle enthusiasts for all the help they provided to make this release another success.

General changes

Visual changes

  • Mouse-over
  • More intuitive menus
  • New welcome screen
  • Hop creation
  • Improved error handling configuration
  • New perspectives support for Agile BI visualisations, modelling, scheduling, etc.

Running jobs in Spoon

  • Drill down into running job entries
  • Visual indicators of running and completed job entries with success and failure mini-icons
  • Mouse over completion mini-icons shows details of execution results
  • Log capturing of completed job entries

Running transformations in Spoon

  • Drill down into running transformation job entries and mappings
  • Row input/output sniff testing: see what rows are passing
  • Remote input/output sniff testing on a Carte server

New logging architecture

  • Reduced memory consumption
  • Incremental log updates
  • Global log buffer size limit for long running jobs/transformations
  • Interval logging
  • Auto clean-up of old log records
  • Log record time-outs
  • Log record lineage
  • Log record colour coding in Spoon (blue and red for error lines)
  • Step Logging
  • Job entry logging
  • Execution lineage logging
  • Renaming individual columns
  • Global configuration options for all log tables

New plug-in architecture

  • Unified plug-in architecture
  • Easier deployment and packaging
  • Step, job entry, partitioner, database type, spoon perspective, life-cycle, ... : all pluggable

New repository plug-in architecture

  • Allowing for 3rd party repositories like the Pentaho Unified Enterprise Repository
  • Removed dependencies to relational database repository (still supported though)
  • Added support for repositories capable of team-development (file locking)
  • Added support for repositories capable of fine-grained security repositories
  • Added support for repositories capable of storing and retrieving revision history

Step changes

New steps

  • SAP Input: Reads data from an SAP/R3 application server. (needs jsapco.jar not included in PDI)
  • Data Grid : Allows you to enter static rows of data for reference or testing purposes
  • OLAP Input: read data from an OLAP server using olap4j over XML/A: Mondrian, Palo, SSAS, SAP B/W
  • Salesforce Delete, Insert, Update, Upsert
  • Add fields changing sequence: a sequence that gets reset when the values in a set of fields changes. (group sequence)
  • User Defined Java Class: create your own plugin on the fly in a step (coming out of incubation)
  • Send information using Syslog: Send a message to a Syslog server. http://en.wikipedia.org/wiki/Syslog
  • Java Filter : Filter based on a User Defined Java Expression
  • Memory Group By: for smaller groups you can keep the intermediate statistical results in memory leading to faster results
  • LucidDB streaming bulk loader
  • Teradata Fastload Bulk loader
  • Experimental steps added: Get table names, Email messages input, ...

Updated steps

  • TODO

Job entry changes

New job entries

  • Send information using Syslog
  • Check DB connections

Updated job entries

  • TODO

Databases

  • New plugin architecture
  • ...
  • TODO

Repository

  • New repository plug-in architecture
  • New Pentaho Unified Enterprise Repository type
  • New File repository type
  • New repository explorer
  • ...

Internationalization


TODO:

Community and codebase

Codebase

Even though we try our best to re-factor and simplify the codebase all the time, there is no denying that the codebase keeps growing.
Right before every release we run the following command:

find . -name "*.java" -exec wc -l {} \; | awk '{ sum+=$1 } END { print sum }'

This is what that gave us over the last releases:

Version

Lines of code

Increase

% inc.

2.1.4

160,000

 

 

2.2.2

177,450

  17,450

10.9%

2.3.0

213,489

  36,039

20.3%

2.4.0

256,030

  42,541

19.9%

2.5.0

292,241

  36,211

14.1%

3.0.0

348,575

  56,334

19.3%

3.1.0

456,772

108,197

31.0%

3.2.0

529,277

  72,505

15.8%

4.0.0

607,180

  77,903

14.7%


Libraries

The total library portfolio of Pentaho Data Integration consists of these libs:

Filename

Description

Dependency

kettle-core.jar

A small set of core classes and utilities for the Kettle environments

none

kettle-db.jar

Contains database related classes

kettle-core

kettle-engine.jar

The transformation and job runtime engines

kettle-core, kettle-db

kettle-ui-swtjar

The UI classes, Spoon, dialogs, etc

kettle-core, kettle-db, kettle-engine



Matt Casters - Okegem/Belgium - March 29th 2010