Documenting Pentaho Data Integration (Kettle) Projects

Introduction

Kettle transformations and jobs files are saved as xml files. 

XSLT transformations can be used to generate dynamic documentation of ETL projects in Pentaho.

Organizing folders in projects

Organize your folders starting with a project folder.

Folder

Content

/ETL

Main folder for all ETL projects

/ETL/extract_prices

Project folder for "extract_prices" project

/ETL/extract_prices/logs

log output for all jobs and transforms

/ETL/extract_prices/docs

documentation folder

/ETL/extract_prices/docs/xslt

xslt transforms for documentation

/ETL/extract_prices/sandbox

sandbox for testing transformations and jobs

/ETL/extract_prices/data

Data files used in ETL

Using xalan script to generate xhtml files in batch

Step 1 - Download xalan

Download xalan project in http://xml.apache.org/xalan-j/
Copy the following 4 files and place them in PROJECT/docs/xslt/xalan/ folder

  1. serializer.jar
  2. xalan.jar
  3. xercesImpl.jar
  4. xml-apis.jar

Step 2 - write a batch file

Assuming a windows installation, write a bat file and place it in xalan folder...

xalan.bat
set CLASSPATH=.
java org.apache.xalan.xslt.Process -IN %1 -XSL %2 -OUT %3

Alternatively, look at the kettledoc.bat file attached.

Step 3 - Copy kettle.xsl to PROJECT/docs/xslt

Copy attached file kettle.xsl to PROJECT/docs/xslt

Step 4 - Copy pentaho.css to PROJECT/docs/xslt

Copy attached file pentaho.css to PROJECT/docs/xslt

Step 5 - Copy ui/images

Copy /ui/images folder from the kettle installation folder into PROJECT/docs/xslt/ui/images

Step 5 - Transform ktr file to html

To convert a ktr/kjb file to html in Pentaho color scheme and style do this...

in xalan directory...
xalan.bat ../../../KETTLE.ktr ../kettle.xsl ../../KETTLE.ktr.html

open the html file in any browser.

Using any web browser to dynamically view the transform without a batch

  1. Copy your ktr kjb files to the PROJECT/docs folder
  2. Copy the attached xslt files into the docs folder
  3. Use an editor and insert this line right after the <?xml> tag (line 1)
    <?xml-stylesheet type="text/xsl" href="kettle_job_xslt.xml"?> for kjb
    <?xml-stylesheet type="text/xsl" href="kettle_trans_xslt.xml"?> for ktr
    
  4. Use any browser and open each kjb/ktr directly.

Note: Attached is kettle.xsl that combines both jobs and transformations.

Modification to spoon to enable dynamic documentation

Modify org/pentaho/di/core/xml/XMLHandler.java.
Look for the following code segment...

    /**
     * The header string to specify encoding in an XML file
     * @param encoding The desired encoding to use in the XML file
     * @return The XML header.
     */
    public static final String getXMLHeader(String encoding)
    {
        return "<?xml version=\"1.0\" encoding=\""+encoding+"\"?>"+Const.CR;
    }

Insert this string to the XML header

<?xml-stylesheet type="text/xsl" href="kettle_job_xslt.xml"?>

Recompile.

Now, when you read or write ktr/kjb files, it will include the xsl header line.

Kettle cookbook

Look at kettle cookbook at code.google.

It generates the image of the transform/job.