Embedded Datasource Architecture

The Embedded Datasource project is a series of architecture changes and features that will facilitate easy addition of new datasource types in the reporting engine and it's primary designer, Report Designer. This enhancement will leverage PDI transformations to provide the data access, and XUL interfaces will allow cross platform rendering of the editors for the datasource definitions. The architecture described will impact the reporting engine, Report Designer, the PDI plugin system, the XUL framework, XUL SWT, XUL GWT and XUL Swing projects. This work should provide the foundation and UI elements for a thin PDI client (thin Instaview, thin Spoon, etc.) later on down the road.

Architecture Overview

The reporting engine's extensions-kettle project is being extended to facilitate the new embedded datasource architecture. The embedded datasource relies on template Kettle transformations that can be edited (through XUL UI) by users in the Pentaho Report Designer, then sent to the Kettle engine for execution with the data results sent back to the report engine for report processing.

This differs from the current Pentaho Data Integration datasource primarily in the extended UI capabilities. The Pentaho Data Integration datasource requires the user to define the Kettle transformation in Spoon, then attach that transformation to the report. The embedded datasource provides (where implemented) a UI for editing the step that provides data input (pre-defined in the template), provides the templated transformation, and in the simplest cases, the user is not required to interact with PDI or the transformation directly.

In cases where there is no available cross-platform compatible UI, then the tradition PDI datasource UI is displayed for the user.

The Template System

The templates for the embedded datasources provide the transformation logic for each datasource. For instance, a template could be designed to retrieve MongoDB data, using a MongoDB input step. The MongoDB metadata needed to run the step would be left blank, and provided at design time by the report designer, through a cross platform compatible XUL based dialog.

The template system for the embedded datasources is modeled on Pentaho Instaview, and as such Instaview's coding conventions will be followed:

The templates will reside in a folder in the Report Designer distribution. Each template's name is the display name that is used in the Report Designer UI where datasource types are available for selection.

Each template should have at least two steps defined, and can contain more: 

  1. A step named "input": the class for this step will provide the metadata necessary for the UI elements that will be rendered when this datasource template is chosen.
  2. A step named "output": This is the step that will be responsible for the fields and data that will be returned to the report engine for processing in the report.
  3. If no output step is defined, the input step becomes the output step.

Enhancing the KettleDataFactory

The KettleDataFactory will interact with a new transformation producer, the EmbeddedKettleTransformationProducer. This producer handles the template approach to the datasource type. The following conventions have been agreed upon regarding this new producer: 

  1. This producer can handle all new embedded datasource query types; the type will be derived from the input step's StepMetaInterface class implementation.
  2. The name of the query will be specified by the user in the UI.
  3. The name of the datasource (the highest level folder under which the queries are grouped once they are created) will be based on the derived StepMetaInterface class implementation.  Under certain "advanced mode" circumstances, different datasource query types can belong to the same datasource. In this case, the name of the datasource will be "Mixed Type Datasource". (//TODO: think of a better name for this)

The KettleDataFactory will be the first reporting engine extension to implement and expose more than one DataFactoryMetadata class instance. The classes implementing this interface, the DataFactoryMetadata, provide information to the engine like which dialog should launch for a given data factory, what icon should be displayed, what the display name should be, etc. We will create one instance of the implementation (EmbeddedKettleDataFactoryMetadata) per embedded datasource type, and register it in the KettleDataFactoryModule.

Embedded Datasource Persistence

The EmbeddedKettleTransformationProducer producer will retrieve a templated transformation from the Report Designer distribution when asked for a new datasource query; it will retrieve the transformation from the .prpt bundle when editing an existing query. The transformation is stored in the bundle as a .ktr file specifically to support "advanced mode" scenarios, described below in the section Advanced Mode Accommodations.

Advanced Mode Accomodations

Given the stories identified by PM, we will accomodate the following functionality:

  1. Once a report is created with a embedded datasource, allow an advanced user to have access to the transformation, outside of Report Designer. The user should be able to modify the transformation, re-bundle it in the report, and as long as the template conventions are followed, the report should still run and render, in Report Designer as well as in the other execution environments (BA Server. etc).
  2. Allow advanced users to add and modify the transformation templates, including adding new templates (with possibly two templates having the same StepMetaInterface class for the "input" step), and/or adding steps between the "input" and "output" steps. Users may also want a template that defaults specific values for the metadata in the "input" or "output" steps.

These requirements have led to the agreement that the display names for listing datasource and their queries in any UI will always be derived from the template (if new) or the internal transformation (if an existing datasource).

Should an advanced user create a case where queries are of different types under the same datasource, we will modify the datasource display name to reflect this "mixed mode".

Should an advanced user introduce an input step for which we do not have a cross-platform compatible dialog implemented, the UI will fall back to the PDI datasource dialog already available in Report Designer.

The User Interface Experience

The user experience that we have currently planned will re-use the PDI datasource dialog that exists today:

When a query is of a StepMetaInterface class that has a cross platform compatible dialog, the dialog will be overlayed in the PDI datasource dialog, hiding the file and step components on the right side of the dialog.  The dialog shown below demonstrates a query selected of type "MongoDB".

 

When creating a new datasource from the menus in Report Designer, and type context is known, then the dialog will open and start with the cross platform compatible components rendered for the datasource type. The "Add Query" button in the PDI dialog (the plus above the left side query list element) will create only new queries of the known datasource type.

If we are in "advanced mode", and the queries are of different datasource types, then the "Add Query" button will give the user a list of embedded datasource types to choose from.

The XUL Interfaces

The PDI datasource dialog today is not XUL-ified. The proper way to handle the dialog changes we want would be to re-create this dialog with XUL, and user overlays for the datasource type changes.

At a minimum, for this first pass, this requires :

  1. a new MongoDB cross platform compatible dialog
  2. a new PD datasource cross platform compatible dialog
  3. a new preview cross platform compatible dialog

Appendix A. PM and Architect Stories Driving the Embedded Datasource Architecture

The following stories are driving much of the architecture for this feature set.

Use Case:  A New Datasource “X” has come on the market,   and Product Management wants Spoon, Report Designer, Datasource Wizard, and Instaview to all access this datasource
  • Current approach – Implement things multiple times for each pillar – A new DataFactory and Datasource Editor dialog  in reporting, a new Kettle Plugin including a  Step and Step Dialog in Spoon, a new capability within the datasource wizard, a template in Instaview.
  • Long term approach – Implement a new Kettle Step and Kettle Step Dialog in XUL, a template for reporting, instaview and datasource wizard.
  • The initial experience in report designer should feel native – I should see “X” appear in my available datasources, I should be able to add a new “X”, customize the XUL dialog with connection and query properties, and save that prpt, edit, etc.
Custom Inline ETL Use Case:
  • I’m a super IT guy, and the new Datasource “X” doesn’t meet my needs – I want to do a geo lookup of the IP address and group by country in the report
  • I read in the advanced section of Report Designer Guide that certain datasources are powered by Kettle’s transformation engine, and I can augment the inline etl by modifying my report’s transformation
  • I export the KTR from the report designer, open it in Spoon, and then add the geo lookup step in the transformation
  • I then point the new KTR in the report and have access to the Country data
  • Question – at this point, should I still be able to edit Datasource “X” natively in Report Designer?  In Insta, we show warning messages when you go into this mode, but we’d need to speak with Rob and Doug to understand what the experience is when a user has started customizing the transformation and gone off the standard path – in this scenario, as long as that Input Dialog “X” is still labeled “Input” in the template, should it just stay in Native mode?
Custom Template Use Case:
  • I’m a super IT guy and I’m supporting a bunch of business users.  They all want access to MongoDB in Report Designer enriched with GEO and other items, but I don’t want them to write their own JSON queries.
  • I build a custom template that is driven by Parameters vs. the MongoDB Input Dialog -  It appears in the datasources as “My Company Datasource”, and the business users can quickly select from the dropdowns their preferences for the query
End Goals

A - As a Pentaho PM and developer I want…

  1. Any data source that we create can immediately be used anywhere.  No more checklists of which data sources can be used from which tools.  
  2. Easier maintenance, a bug gets fixed once and it is fixed everywhere

B - As an end user I want…

  1. Same User Experience everywhere, less learning curve and training.  
  2. A native app experience.  UI should match the application look and feel.  Swing feel in PRD, SWT feel in PDI and PUC look and feel from the BA Server

C - As a BI Architect I want…

  1. To be able to create, download new and/or modify existing datasources throughout the suite in a consistent, easily understood manner
  2. Make these new data source available to all my users with one quick and easy step
  3. Sometimes I want my changes to effect all reports that use the modified datasource
  4. Sometimes I want my changes to ONLY effect new reports that use the modified datasource

D - As an OEM I want…

  1. Complete control over the datasources that my embedded Pentaho instance can access
  2. Create, modify and deploy new datasources easily