Dump any SQL resultset to MS Excel (generic job)

Dump any SQL resultset to MS Excel

Abstract: Kettle's provides an Excel output step but you always need to manually configure its fields. Use this job to generate a transformation that dumps a SQL resultset to Excel on the fly, and execute that generated transformation immediately.

Authors: Roland Bouman

License: LGPL

Kettle versions:Kettle 3.2.0 and up

Attachments

  • kexcetle.kjb - Kettle Job that takes parameters to specify JDBC connection info, an SQL statement text, and a directory and file for the excel file. Running it will dump the resultset of the SQL statement to the specified MS Excel file.
  • gen-ktr.ktr - A Kettle transformation that takes parameters to specify JDBC connection info, an SQL statement text, and a directory and file for the excel file. Running it will generate a transformation called gen.ktr that actually executes the SQL statement and dumps the results to the specified Excel file
  • ktr-template.ktr - A transformation template. This is used by the gen-ktr.ktr transformation as a basis for generating an actual transformation by injecting the necessary metadata about the field layout of the SQL resultset into the definition of the Excel output step to configure its fields

Download all

Background

No IT solution is complete as long as it can't dump data to excel.

Kettle can dump any data to Excel just fine, but one of the drawbacks is that kettle needs to know about the format of the data stream in advance in order to define the fields in the receiving MS Excel file.

As it turns out, there is a simple way around this:

  1. Run a transformation that accepts an SQL statement and data to connect to a database. This transformation then examines the metadata of the resultset generated by the SQL statement and uses that to generate a transformation that can actually dump that data stream to MS Excel
  2. Run the generated transformation immediately after generating it.

By wrapping these two transformations in a job, you can simply run one job that can dump the data retrieved by an arbitrary SQL statement to MS Excel.