Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 4.0
Include Page
BAD:NavPanelBAD:
NavPanel

Pentaho Big Data Plugin
Div
stylefloat:right
Image Modified

The Pentaho Big Data Plugin Project provides support for an ever-expanding Big Data community within the Pentaho ecosystem. It is a plugin for the Pentaho Kettle engine which can be used within Pentaho Data Integration (Kettle), Pentaho Reporting, and the Pentaho BI Platform.

...

This project contains the implementations for connecting to or preforming the following:

  • Pentaho MapReduce: visually design MapReduce jobs as Kettle transformations
  • HDFS File Operations: Read/write directly from any Kettle step. All made possible by the ubiquitous use of Apache VFS throughout Kettle
  • Data Sources
    • JDBC connectivity
      • Apache Hive
    • Native RPC connectivity for reading/writing
      • Apache HBase
      • Cassandra
      • MongoDB
      • CouchDB

Key Links

Community and where to find help

...

Here's a short list of resources to help you learn and master Git:

Documentation

Kettle Plugin Development

Getting started with the Pentaho Data Integration Java API

Step Documentation

Job Entry Documentation

Hadoop Configuration

Community Plugins

Here's a list of known community plugins that fall into the "big data" category:

...