Pentaho Data Mining Community Documentation

Quick Start and Overview

Pentaho Data Mining, based on Weka project, is a comprehensive set of tools for machine learning and data mining. Its broad suite of classification, regression, association rules, and clustering algorithms can be used to help you understand the business better and also be exploited to improve future performance through predictive analytics.

There are two versions of Weka:

  1. Weka 3.8 - current stable version. This branch receives bug fixes to core Weka; new features are released through packages that can be installed via the built-in package manager.
  2. Weka 3.9 - development branch. This is a continuation of the 3.8 code line that receives both bug fixes and new features/improvements to core Weka. It also takes advantage of new features released in packages.

Documentation

Pentaho Data Mining (Weka)

There is a book that has been written to accompany Weka - Data Mining: Practical Machine Learning Tools and Techniques (Fourth Edition).

Plugins for Pentaho Data Integration (Kettle)

Developing with Weka

Awards and Publications

Under Development/Roadmap

Archived

Further Links and Information