PDI API Changes wish-list for major releases (like 5.0 or 6.0)

Clean up quoting algorithms in the database dialects

Right now there are a number of dialects like DBase, H2, Ingres, MSAccess, MSSQL, SQLite, Sybase, SybaseIQ, Teradata, UniVerse that have a funky getSchemaTableCombination() method where quoting is taking place.

This is something that should get removed in the future.

ValueMeta : split up in different classes

Create a factory ValueMeta.createValue("name", "String") etc. so that we can support pluggable data types, smaller classes, cleaner code etc.  It will be a challenge to keep the changes minimal.

Example for a use case using milliseconds: http://jira.pentaho.com/browse/PDI-7103

Remove shared objects from TransMeta, JobMeta

Mat Lowery asked the question why we still have List<DatabaseMeta> and so on in TransMeta, JobMeta.  

Historically this is how it ended up but we could think of creating local, repository shared objects, various locations, Pentaho services as replacements.

clone() methods not consistent

DatabaseMeta.clone always returns an object with a null ID. SlaveServer returns an object with a non-null ID if the cloned object has a non-null ID.

Favor immutable objects as they can be cached.

In a certain Repository implementation, we wish to cache objects. However, some code nulls out the IDs of some of the shared objects, thus corrupting the cache. Cloning is a solution but immutability is better.

API libs available for Maven

Provide all libraries necessary to use the Kettle API in a public Maven repository. This would greatly ease adoption of the API. Provide an example.

Eliminate the need for mostly similar classes--one that does something for jobs and another class that does the same thing for transformations.

For example, TransHistoryDelegate and JobHistoryDelegate are very similar. Why have two separate classes? The main reason for two classes is the lack of a superclass with generic behavior. The two classes duplicate message keys and XUL documents in addition to a very large percentage of code.

Create pluggable execution engines

It would be great to have an execution engine api that we can extend - or just out right use - without having to know the intricacies of how Trans or Job works. Something similar to the Single Threaded execution engine. Perhaps something like:

Engine.java
ExecutionEngine engine = ExecutionEngines.lookup(EngineType.Normal);

engine.start(myTrans);
engine.start(myJob);
...
engine.addListener(myTrans, myListener);
...
engine.getStatus(myTrans);
...
engine.waitUntilFinished(myJob);
...
engine.stop(myTrans);

Easier API to build transformations/jobs

As a developer, I'd like to more easily build a transformation or job and interact with steps/job entries. See some of the unit tests in Kettle to see what's currently required to build a transformation by API. It would be great if we could do something like (note this is very crude pseudocode):

JobTransBuilder.java
TransMeta meta = Builder.createTransformationMeta().addStep(MyStepMeta.class).addStep(DummyStepMeta.class);
Builder.createHop(meta, meta.getStep("MyStepMeta"), meta.getStep("DummyStepMeta"));

Existing JIRA cases

Search for the API component within the JIRA PDI space: http://jira.pentaho.com/browse/PDI/component/10324