Google Summer of Code Ideas Page

All ideas on this page assume Java programming skills except where noted.

Pentaho Platform

Open Layers - AVAILABLE

Enhance the OpenLayers integration within the dashboard framework

  • Support drill-linking. Make it easy to provide a URL template that will be used when the user clicks on a marker
  • Support value-based marker selection. This will make it easier to show different markers based on some piece of data related to that point
  • Better metadata support
  • AJAX support for loading data

Mentor: James Dixon

PHP integration - AVAILABLE

Enhance the component UI layer to support presentation and navigation within a PHP framework.  Create a navigator similar to the current PCI to demonstrate the PHP capabilities.

1) Create a PHP library that enables PHP applications to embed Pentaho BI functionality. Using Pear or another PHP web service package this library will use web services in the Pentaho server to:

  • Get information about folders and files (reports  / dashboards etc ) in the Pentaho Repository
  • Get information about the parameters needed to run a report or process 
  • Execute a report or process via web services
  • Provide URLs that will launch a report or process from the client browser

2) Create a sample PHP application that show this library in use
3) Document the PHP library
Mentor: James Dixon

OpenOffice Reports - AVAILABLE

The report engine in OpenOffice uses one of the Pentaho reporting engines to generate the content.

This project will provide integration between the Penaho BI server and OpenOffice. This will allow OpenOffice reports to be available to users over the web. It will also allow OpenOffice reports to be scheduled and delivered via email.

This project needs to:

  • Enable the BI server to execute OpenOffice reports. We have unit test code that shows how this can be done
  • Enable OpenOffice reports to be parameterized
  • Create an OpenOffice plugin that allows users to publish reports to the BI server

Mentor: James Dixon

Embed Platform in Client Application (case study) - AVAILABLE

The Pentaho BI Platform is typically thought of in it's enterprise-scale web application incarnation. We do not have many proofs of concept that illustrate the ability to embed the platform smaller scale, non-webapp, thick client applications. The idea here is to custom-fit the platform into an existing application (open source) to illustrate how an embedded platform can add reporting, data integration, and/or analysis capabilities and increase the value of an application.

Mentor: Aaron Phillips

JCR-1.0 Content Repository support - AVAILABLE

The Pentaho BI Platform manages solutions in a home-grown solution repository. The task here is to replace our solution repository with the more robust and feature-rich JCR-1.0 Content Repository, Jackrabbit.

Mentor: Aaron Phillips

Pentaho Reporting

Implement LibFormat - AVAILABLE

Implement a OpenOffice compatible data-format library. OpenOffice uses
an extended formatting model, that is slightly incompatible to the
standard java ones. Their model has slightly more abilities and is a
requirement for OpenOffice and Excel export.

(At the moment, we simply use the Java Formatter classes, and hope that
these beasts are compatible enough. This covers 90% of all cases, but it
is no perfect solution.)

Mentor: Thomas Morgner

LibFormula Enhancements - AVAILABLE

Formula editor

A formula editor that makes it easy for users to enter complex formulas.
This is the same as the various Excel/OO-Calc formula wizards.

Implement More Formulas

Add all formula functions that are needed to become OpenFormula Level 1
or Level 2 compatible.

Mentor: Thomas Morgner

LibFonts Enhancements - AVAILABLE

Complete the TTF-handling

Implement the font-metrics computation for TrueType fonts.

Add Type1 & Type3 Font support

LibFonts cannot read these font files at the moment, so that we have to
fall back to the much slower JDK and/or iText for these fonts. That also
prevents us from reading all available font metrics, which makes layouting
more inaccurate.

Pentaho Metadata

Add Publishing of OLAP Models to Pentaho's BI Platform - AVAILABLE

Add the ability to publish metadata based OLAP views automatically to Pentaho's BI Platform. This would involve extending the existing Metadata Editor Publish functionality.

Add string functions to Metadata's open formula API - AVAILABLE

Pentaho Metadata uses Open Formula to represent platform independent SQL conditions and formulas. This task involves implementing general string functions, such as substring, length, trim, etc, in the open formula to SQL dialect engine within Metadata's MQLQuery architecture. See Pentaho Metadata Formulas for more information on formulas.

Enhance MQL Query Editor - AVAILABLE

This task involves adding free form constraint editing to MQL Query Editor and converting MQL Query Editor from SWT to Pentaho's new XUL UI Framework. By converting this dialog to Pentaho's XUL framework, the dialog can run within both SWT and Swing environments, and has the potential to run in a Web environment once the Web XUL UI Framework layer is implemented.

Mentor: Will Gorman

Pentaho Common Code

Expose new common data source config dialog in all tools - AVAILABLE

Take the PDI (Kettle) database connection dialog which was rewritten using a reusable XUL based UI framework and implement it is some or all of the other Pentaho applications.

Apps that could possibly use the new database connection framework:

Report Designer (Swing)
RDW (SWT)
Kettle (JFace\SWT)
Management Services (HTML)
Schema Designer (Swing)

Tasks:

  • Use XUL definitions for the dialogs that were re-designed
  • Creat the widget renderers for each of the client tool technologies (This should require a common repository for XUL - to - x- technology)
  • Mapping the renderers to the XUL definition

Mentor: Gretchen Moran

Pentaho Data Mining

Clustering and Naive Bayes Visualization - AVAILABLE

Provide a graphical visualization for the results of clustering (and potentially a naive Bayes model). The visualization would have one row per cluster, where each row displays either histograms (for numeric variables) or pie charts (for discrete variables) showing distributions over the whole data and within cluster/class distributions superimposed over the top. The tool should allow the user to specify the order of the rows (i.e. large clusters/classes at the top) and the order of the variables (i.e. perhaps ordered by importance to the clustering/predictive model).

Enhance WEKA's Distributed Experiment Environment - AVAILABLE

WEKA provides a lightweight, general purpose RMI-driven compute engine for executing sub-experiments remotely. The Experimenter GUI acts as the master experiment server and divides an experiment into sub-experiments (tasks) that are sent to remote engines for execution. At present, the remote engines store sub-tasks in a queue and execute them one at a time. This architecture could be easily extended to take advantage of multiple CPUs/cores on each remote computer. Further enhancements could include algorithms for load balancing, error recovery and an improved user interface for monitoring/managing remote engines.

Kettle Plugin for WEKA's KnowledgeFlow Environment - AVAILABLE

Pentaho Data Integration (Kettle) has the ability to pump data into a WEKA KnowledgeFlow process for executing a data mining task as part of an ETL transformation. On the flip side, the KnowledgeFlow could benefit from being able to harness Kettle's ability to extract and transform data from a wide variety of sources. The goal of this project is to create a data source component for WEKA's KnowledgeFlow that can execute a Kettle transformation and translate the incoming data rows into WEKA's native Instance data structure.

Other machine learning/data mining projects - AVAILABLE

Any projects that have a tangible outcome with respect to the Weka code-base will be considered (e.g. new ideas for classifiers, clusterers, etc. or stuff from the literature that isn't in Weka yet).

Some possibilities:

  • Implement the BIRCH clustering method of Zang, Ramakrishnan and Livney
  • Implement a fast association rules learner based on FP-growth/FP-tree.
  • Improve WEKA's implementation of the multilayer perceptron neural network and add further connectionist methods (e.g. SOMs, recurrent networks etc.)

Mentor: Mark Hall