Integrating DQ with Pentaho Data Integration

Overview

This page will provide you with information and links to resources to help get you started integrating your Data Quality platform with Pentaho Data Integration.  Our goal is to provide you with the best possible platform for leveraging your data quality services in the greater context of ETL and Data Integration workflows.  We do not have any specific recommendations as to whether you keep your integration work open source or proprietary, but we will gladly support you in building community, feedback and contributions should you choose to open source your integration work.

Getting Started

If you are brand new to Pentaho Data Integration, we suggest you begin by downloading PDI and going through the Getting Started Guide which is linked to on the welcome page when you launch the PDI designer (Spoon). The Getting Started Guide will take you approximately 60-90 minutes to go through and it will introduce you to designing and running PDI Transformations and Jobs. You can download PDI from one of the following locations:

Enterprise Edition (30 day free evaluation) - http://www.pentaho.com/download/
Community Edition - http://sourceforge.net/projects/pentaho/files/Data%20Integration/

Pentaho Data Integration consists of two distinct document types - Jobs and Transformations:

  • Jobs are workflow like documents used to coordinate and orchestrate tasks. For example, you might create a job that waits for a file to arrive in a directory location, then kick off a transformation to process the file and load it into a database and then generate an email indicating if the job completed successfully or not.
  • Transformations are 'data flows' defining an ETL process - reading data from one or more locations, manipulating that data and loading it into a target.

Designing jobs and transformations is a visual exercise where designers build the flows from a design palate of building blocks, 'Job Entries' in the case of jobs, and 'Steps' in case of transformations. Job Entries and Steps are built upon a pluggable architecture which allows you to create new steps that wrap your services into a Job or Transformation flow.

Writing Your Own PDI Plugin

All PDI plugins are written in Java. In most cases, plugin developers will begin by looking at an existing plugin which provides similar functionality to the plugin you plan to develop and using it as a basis or guideline for how to implement your own plugin. Here are some useful links for Java Developers:

If you get stuck at any point or would like to reach out to our technical team for guidance, feel free to contact one of the following:

  • Jake Cornelius, VP of Product Management, jtcornelius@pentaho.com
  • Will Gorman, VP of Engineering, wgorman@pentaho.com
  • Matt Casters, Chief Data Integration Officer, mcasters@pentaho.com

Highlighting Your Solutions

We are very interested in highlighting your integration work with Pentaho. Here are just a few potential options to build visibility to your solutions:

  • Post your plugin to the Community Plugins page - http://wiki.pentaho.com/display/EAI/List+of+Available+Pentaho+Data+Integration+Plug-Ins - instructions for posting your plugin(s) are at the top of the page
  • Community Webinars - Community webinars are conducted on the 1st and 3rd Wednesday of each month.  Typical attendance is between 25 and 50 of our core community members ranging from BI Practitioners, to Web Developers, to Java Developers.  This is a great way to show off your work to other members of the Pentaho Community and get feedback, suggestions and possibly contributions!  If you are interested in doing a community webinar, please contact: Doug Moran, VP of Community, by emailing dmoran@pentaho.com (mailto: dmoran@pentaho.com)
  • Pentaho User Groups - Pen (taho has a growing number of community lead User Groups worldwide who periodically meet to discuss Pentaho-related topics.  In many cases, there will be an active call for topics.  If you are interested in getting information on upcoming User Group gatherings, please contact Doug Moran (dmoran@pentaho.com (mailto:%20dmoran@pentaho.com))
  • Social Media - We have a number of active bloggers and tweeters that can support you in highlighting your integration work through social media.  If you want to do some social media outreach, contact Jake Cornelius, VP of Product Management, by emailing jtcornelius@pentaho.com (mailto: jtcornelius@pentaho.com)
  • Corporate Webinars - We can also consider putting together corporate webinars which are organized and delivered through Pentaho's Marketing Department.  We actively market these events through a variety of outreach programs.  Typical attendance is between 400 and 700 registered attendees, with between 150 and 400 live attendees depending on the topic. The best topics for Corporate webinars tend to be those that can be tied to an active Pentaho initiative (i.e. Agile BI, Big Data, OSBI, etc.) or involve a customer success story.  You can submit your proposals for corporate webinars to Joe McGonnell, SVP of Marketing, by emailing jmcgonnell@pentaho.com (mailto: jmcgonnell@pentaho.com)