MetaDataSpec

Warning

This content applies to Classic-Engine version 0.8.11 and Report-Designer 2.1.

Datasources

The reporting engine contains a datasource metadata layer. This layer allows datasources to provide additional formatting or processing information along with the bulk data.

DataSources provide MetaData by returning TableModels that implement the "org.pentaho.reporting.engine.classic.core.MetaTableModel". The Meta-Data is stored in DataAttributes, which are a map of <namespace,name> keys and arbitrary values.

DataSources can provide Metadata on three levels:

  • ResultSet metadata via the "org.pentaho.reporting.engine.classic.core.MetaTableModel#getTableAttributes" method

Table/ResultSet-metadata should provide general information about the resultset, for instance details on the query, which database was used or whether the resultset is a remote object (and therefore expensive to access).

The following built-in datasources provide table-metadata:

DataSource type

Namespace

OLAP4J datasource

"http://reporting.pentaho.org/namespaces/engine/meta-attributes/olap4j"

Mondrian datasource

"http://reporting.pentaho.org/namespaces/engine/meta-attributes/mondrian"

  • Column-specific metadata via the
    "org.pentaho.reporting.engine.classic.core.MetaTableModel#getColumnAttributes" method

This provides metadata that is the same for all columns, describing the column type, giving formatting information or telling how this column was created. The metadata provides for instance, hints on the Column-labels, provide user-friendly names for columns or give hints how the data in the columns should be formatted.

The following built-in datasources provide column-metadata:

DataSource type

Namespace

Pentaho-MetaData datasource

"http://reporting.pentaho.org/namespaces/engine/meta-attributes/pentaho-meta-data"

SQL datasource

(various core namespaces defined in "org.pentaho.reporting.engine.classic.core.MetaAttributeNames"

  • Cell-Metadata via the "org.pentaho.reporting.engine.classic.core.MetaTableModel#getCellDataAttributes" method

Cell-Metadata provides unique additional data for each of the cells in the resultset. This metadata allows the datasource to influence the formatting of certain cells based on the query. Right now, only the MDX-driven datasources can formula queries that create such cell-level attributes.

The following built-in datasources provide cell-metadata:

DataSource type

Namespace

OLAP4J datasource

"http://reporting.pentaho.org/namespaces/engine/meta-attributes/olap4j"

Mondrian datasource

"http://reporting.pentaho.org/namespaces/engine/meta-attributes/mondrian"

Datasource-Metadata Mapping

As each datasource has its own way to express metadata, the classic-engine metadata system does not try to force all metadata layers into one slim corset of predefined attributes.

The engine provides a well-known set of attributes along with strict syntax rules for these attributes. DataSources must map their proprietary metadata information into a own namespace to minimize collisions between the various metadata providers. The proprietary metadata can then be mapped into the well-defined attribute-set via mapping rules.

The mapping is implemented as DataSchemaDefinition", which is read and processed during the report processing. The mapping file contains rules that define how the proprietary objects created by the datasource should be processed to create objects the reporting engine can understand. The mapping does not prevent access to the raw-metadata that is available via the DataSources defined private namespace.

Functions and expressions can access the metadata through the ExpressionRuntime class.

User-Friendly names for Columns

Some metadata systems allow to hide the technical column names behind user-defined labels.

Fields, Expression and Functions that reference other columns from the datasource must always use the technical column name. Most metadata systems do not properly enforce uniqueness among the userfriendly names or even provide localized names. This makes it impossible to guarantee which column will be accessed during the report processing or makes columns unaccessible as soon as the processing happens in a different Locale setting.

Engine-level report objects must always reference technical columns and datasources must provide locale and runtime agnostic technical columnnames that allow a clean and unique mapping to column positions regardless of the current locale or similar settings.

The DataSchema-class provides access to all metadata settings, including the friendly names. Designtools that make use of the friendly names must ensure that the report-model is populated with technical names.

Dataschema mappings

Dataschema mappings are rules that define metadata attributes or translate metadata attributes from proprietary systems like Pentaho Metadata or MDX-Attributes into the engine's preferred metadata style.

Pentaho Reporting uses a simple metadata system. Metadata is stored as map of <(namespace, name), (value)> on the various dataitems. Metadata can be given on a global level, column-level (PMD) or cell-level (for example MDX attributes).

The values are always primitive objects, so that they can be used in computations more easily. Limiting ourselves to the most simple objects also serves as common denominator between all metadata systems out there. This keeps the system open to accept bindings from all metadata systems, as long as they provide a Java-API.

Metadata is provided by JDBC, MQL and Mondrian/OLAP4J datasources. In addition to that, parameters and expressions provide some primitive hooks into the metadata system as well (mainly to serve as matching parameters for the rules).

There are only two mapping types:

  • Global mappings
    These are always applied unconditionally.

This is primarily used to define mappings from one meta-system to another or to define global attributes that are always set on all columns. The main use-case for that is to provide administrators a way to inject well-known global attributes to the system that can be used in formulas.

  • (in)direct-mappings
    A mapping that is applied if a given rule is matched. A rule can be either that a given attribute is defined (and can have any value) or that it is checked whether a attribute matches the given value.

Direct-mappings are a special case with a rule searching for field-names only. A direct-mapping is the same as a indirect-mapping searching for (core::fieldname) with the given fieldname value.

These mappings serve two purposes. Again they can be used to map attributes from one meta-system to another, but this time the mapping is conditional, which allows some more sophisticated rules.

And finally, they can be used to enrich the existing metadata with additional values, as some sort of lightweight metadata system.

On each mapping type, we provide the option to either add static attributes or to map (and possibly convert) attributes from one metadata attribute to another. As every metadata system comes with its own object model, we use "ConceptQueryMapper" to convert metadata-objects into their canonical form.

Right now, there are two ways to use the metadata in the reports:

  • Auto-apply some well-known metadata attributes as style and element-attributes via the "MetaDataStyleEvaluator"

This reads the values for "font-bold", "font-italic", "font-strikethrough", "font-underline", "font-size", "font-family", "background-color", "color", "horizontal-alignment" and "vertical-alignment" as style information.

The "format-string" and "label" of a column can also be applied.

  • Use the metadata in a formula via the "METADATA(<column>, <namespace>, <name>)" formula-function or Use the metadata in a Java-Expression/Function via "ExpressionRuntime#getDataSchema"

In its current state, the metadata-mapping system is not exposed to the common user. Mapping rules can be stored in the bundle, but we do not provide a UI for that in Citrus.

For Citrus, mappings are aimed at data-source implementers and administrators who want to make custom PMD attributes available in the reporting system or who want to create sensible defaults for non-PMD datasources. These folks should know how to use an XML editor and can use the
report-configuration to bind the dataschema files to the engine.