The Pentaho Big Data Plugin Project provides support for an ever-expanding Big Data community within the Pentaho ecosystem. It is a plugin for the Pentaho Kettle engine which can be used within Pentaho Data Integration (Kettle), Pentaho Reporting, and the Pentaho BI Platform.
This project contains the implementations for connecting to or preforming the following:
The Big Data Forum exists for both users and developers. The community also manages the ##pentaho IRC channel on irc.freenode.net.
The Pentaho Big Data Plugin is now a maven project. Please refer to the project readme for build information.
We recommend providing unit tests where possible and debugging your code through them.
If you want to see your code executing within Spoon we recommend remote debugging. This approach can be used with Pan, Kitchen, or the BA/DI Server as well. The workflow is as follows:
override.properties
:
override.properties
in the root of the big-data-plugin. This file is a local override for any properties defined build.properties.kettle.dist.dir
and point it to your Kettle install dir based on if you're using the CI download or building from source:
kettle.dist.dir=../data-integration
kettle.dist.dir=../Kettle/distrib
(Note: You must build kettle with `ant distrib` before being able to launch it when using the source. This will build Kettle into Kettle/distrib. For more information see PDI Developer Information)ant resolve install-plugin
(you can drop the resolve after the first build unless the dependencies change)Data Integration 64-bit/Contents/Info.plist
:
-Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=5005
to the JVM Arguments
spoon.sh
or Spoon.bat
: Update line 158 and add the above JVM arguments to the OPT
variableData Integration 64-bit/Contents/Info.plist
: Update the VMOptions property and append the above JVM argumentsRemote Java Application
debug configuration, using the socket attach method and port 5005 (as configured above)We use the Fork + Pull Model to manage community contributions. Please fork the repository and submit a pull request with your changes.
Here's a sample git workflow to get you started:
git config --global core.autocrlf input |
git clone git@github.com:USERNAME/big-data-plugin.git |
git add . && git commit |
git push |
Here's a short list of resources to help you learn and master Git:
Getting started with the Pentaho Data Integration Java API
Here's a list of known community plugins that fall into the "big data" category:
Voldemort Lookup
HPCC Systems ECL Plugins