Rejected Feature Requests

Things that were once proposed but were rejected

...but: Feel free to re-request a feature if you think the reasons for reject are not valid anymore. 

Implement connection type as a variable parameter

"I can parametrize host, user-id, password in database steps in transformations using variables, but I can't set the connection type like that yet. I have jobs that need to execute on multiple types of databases (Oracle, MySql, ...) and I would like the connection type set in the same way so that I can run the same transformation on multiple databases."

This has been requested a few times. Reasons for not implementing it in PDI:

  • For anything but simple SQL statements the SQL you write will be database type dependent. E.g. If you use Oracle analytics you're SQL won't run anymore on DB2 or MySql. Currently you know the type of the database and you could use the full functionality of the database;
  • Starting with PDI version 2.3.1 and continuing in later versions more database specific settings were introduced. So just specifying "this is Oracle, MySql, ..." is not sufficient anymore, and a way to parametrize these specific options would need to be found as well (which would make it pretty complex);
  • How many data warehouses run on multiple types of databases. Most data warehouses are created based on specific operational systems and targeting only specific database types since resulting reports and cubes would also need to run on those databases. So the use of connection type parametrizing would probably also not be that huge.

Possible workaround: maintain duplicate jobs for multiple databases. Alternatively you can use the generic ODBC which supports variable substitution for the driver as of PDI version 2.5.0GA. The disadvantage of the latter solution being that the special database processing for some types of database will not be done of course.

Implement "on the fly DDL" creation of tables, ...

"I want tables to be created on the fly when they don't exist yet".

The reason for not implementing "on the fly DDL" in PDI is that it would work for small examples, but it would make your DBA's life harder for real projects. Most companies have setup pretty complex database privileges which would be hard to maintain using "on the fly DDL".

In the database steps there usually is an "SQL" button that shows the DDL that could be executed. This should mostly be used in quick mock-ups, not for real projects. Do a proper design for real projects.

We are not supporters of any "on the fly DDL" in any step.

Implement a step that shows a dialog and asks parameters

The reason for not implementing this is that PDI is an ETL tool, not a reporting tool. You cannot assume there will be an interactive user answers questions when a job/transformation is running.

You can have a look at the Pentaho framework where you can indeed build webpages that ask for parameters, drop-downs, selectors, ... to parametrize transformations this way. Alternatively in the examples directory there's an example called dialog.ktr that shows a way to make a dialog box using javascript in PDI.

Besides this, as of PDI v2.4.0 the launching dialog also supports the entering of parameters via a GUI style which satisfies most of the people asking to enter parameters.

As of 2.5GA there is a job entry called "Display Msgbox Info" that will display a message in a dialog box, however even this functionality should only be used for debugging purposes. There's absolutely no guarantee that it will work when scheduled (e.g. On UNIX if you do not have a controlling terminal the job will probably even abort when you display a dialog box).

Implement serialization of transformations using reflection

It has been advised a few times to use XML "bean" serialization to automatically serialize and deserialize jobs/transformation as that would make it "easier" to develop them.

It is possible to end up with decent XML code using say XStream, but only by setting all the element names manually etc, you end up with a situation that is far worse than the current one.

The situation with the XML serialization that we have right now is:

1) it always works, regardless if the element/parent element/etc exists or not, is empty or not, etc;

2) it's extremely simple, everyone understands it, even those that don't know the XStream/whatever API;

3) the generated XML is understandable and readable. (more so than other tools I might add) This makes PDI more open.

4) we rarely every have any issues with it, no surprises, no NPEs flying around etc. It's very resilient.

5) It's easy to test and debug.

That should be plenty of reason to keep it stable for the time being, not using reflection.

Implement retry on connection issues (this is discussed in PDI-6189)

Usually what is asked is as follows: "My jobs occasionally receive an "I/O Error: Connection reset" while performing DB Lookup steps. This causes the entire transformation to fail. Is there any way to configure the transformation to retry if there are connection issues, or to open another connection?"

An I/O error usually means that the underlying hardware is failing, let's assume it's the network. The connection with the database is dropped so you don't know which SQL statement was executed correctly and which one wasn't. You can't ask the database because the connection is dead and you can't assume that the database rolls back the statements until the last commit.
Well, the problem is that it can occur at the time you do a commit. If you commit 10.000 rows at a time: did they go into the table or not?

Implement GUI components in a transformation or job

Some people request implementation of GUI elements in a transformation or a job. In ETL this is normally not a good thing, most of the times there are not going to be people around to press buttons when the transformation or job is running. The transformation/job may also be running in a an environment that will not allow it to display dialogs: e.g. When running on a carte server on a remote system.

Some GUI components slipped in, but these are mostly for debugging purposes.

Hardcoding Locale

"No-one is going to use dates in another format as the default English Locale, so why don't we hardcode the English Locale." or whatever other Locale feature that can be hardcoded (numbers, ...)

Well, not all people use English as default Locale (or your own preference for that matter), so hardcoding a locale is not a good idea, people should have a choice of default Locale.

See also "on Locales" in the developer guidelines.

Removing repository support

For a new step I need very a very expressive way of saving the meta-data for my step. I can do this with XML, but I will never be able to get it saved to the repository. Why don't we remove support for the repository as no-one really uses it anyway?

Some people do use the repository, and if you want to make a step that is going to be included in the general release you need to support the repository.