Kettle Telemetry and Usage Statistics

Introduction

There are many reasons to collect usage statistics, for example:

  • It can help in improving the product in the main used areas and features (steps, job entries, database types etc.)
  • It can help the user to determine if some features are effected by a planned upgrade (the upgrade notes on each release cover affected steps, job entries etc.)
  • When it gets combined with usage statistics in development/test/production you can also determine if some jobs/transformation are never used

Solutions

Analyze the used steps, job entries and database types

  1. Download the solution _analyze_trans_job
  2. Within PDI/Kettle, please open the job _analyze_trans_job/transformations_jobs/0_analyze_trans_job.kjb
  3. Look at the comment within the job, it gives you all the usage information. For example it is possible to anonymize file names, transformation and step names: please see the option anonymize_names within the parameters.txt file.

If you want to contribute to this solution, the jobs/transformations are hosted on GitHub.

Note: This is limited actually to the file system and does not support a repository or repository exported file.

Pentaho Operations Mart

Within the PDI Enterprise Edition, the Pentaho Operations Mart collects a lot of information and also usage statistics. These can be combined to see what jobs/transformations are used, how often, from what user etc.