What's new or improved in Weka 3.7.1
Classifiers
SPegasos
This is a fast algorithm for learning linear support vector machines and logistic regression via stochastic gradient descent. It is also an incremental classifier, so can be trained in an online setting (weka.classifiers.functions.SPegasos). See:
S. Shalev-Shwartz, Y. Singer, N. Srebro: Pegasos: Primal Estimated sub-GrAdient SOlver for SVM. In: 24th International Conference on MachineLearning, 807-814, 2007.
Friedman's RealAdaBoost algorithm
Algorithm for boosting a 2-class classifier using the Real Adaboost method (weka.classifiers.meta.RealAdaBoost). See:
J. Friedman, T. Hastie, R. Tibshirani (2000). Additive Logistic Regression: a Statistical View of Boosting. Annals of Statistics. 95(2):337-407.
FURIA rule learner
Fuzzy Unordered Rule Induction Algorithm. A fuzzy rule learner based on the well known RIPPER algorithm (weka.classifiers.rules.FURIA). See:
Jens Christian Huehn, Eyke Huellermeier (2009). FURIA: An Algorithm for Unordered Fuzzy Rule Induction. Data Mining and Knowledge Discovery.
Thanks to Jens Christian Huehn for this contribution.
One class classifier
A classifier for one class problems (aka outlier/novelty detection) that combines density and class probability estimation (weka.classifiers.meta.OneClassClassifier). See:
Kathryn Hempstalk, Eibe Frank, Ian H. Witten: One-Class Classification by Combining Density and Class Probability Estimation. In: Proceedings of the 12th European Conference on Principles and Practice of Knowledge Discovery in Databases and 19th European Conference on Machine Learning, ECMLPKDD2008, Berlin, 505--519, 2008.
Parallel ensemble learning
Some meta classifiers in Weka now support multiple cpus/cores and are able to construct ensemble members in parallel. See Support for parallelism in ensemble learning for details.
Miscellaneous
Gaussian process regression is now improved and faster. J48 now includes options to turn off subtree collapsing and the MDL correction for the info gain of splits on numeric attributes.
Association rules
FP-Growth
FPGrowth is a fast method for learning association rules on market basket data. It requires only two passes over the data and constructs a compressed tree-based representation in main memory. Rather than generating candidate frequent item sets and then counting their occurance in the data, it "grows" frequent item sets by recursively processing the tree-structure. This avoids the combinatorial explosion for generate-and-test methods when there are many items (weka.associations.FPGrowth). See:
J. Han, J.Pei, Y. Yin: Mining frequent patterns without candidate generation. In: Proceedings of the 2000 ACM-SIGMID International Conference on Management of Data, 1-12, 2000.
Clusterers
Hierarchical clusterer
Implements a number of classic agglomorative (i.e. bottom up) hierarchical clustering methods (weka.clusterers.HierarchicalClusterer).
Filters
MILES propositionalizaton filter
Implements the MILES transformation that maps multiple instance bags into a high-dimensional single-instance feature space (weka.filters.unsupervised.attribute.MILESFilter). See:
Y. Chen, J. Bi, J.Z. Wang (2006). MILES: Multiple-instance learning via embedded instance selection. IEEE PAMI. 28(12):1931-1947.
Rename attribute
A simple filter for renaming attributes (weka.filters.unsupervised.attribute.RenameAttribute).
Remove by name
Removes attributes based on a regular expression matched against their names (weka.filters.unsupervised.attribute.RemoveByName).
Merge many values
Merges many values of a nominal attribute into one value (weka.filters.unsupervised.attribute.MergeManyValues).
PMML import
Import of PMML RuleSet is now supported.
Attribute selection
Single attribute evaluation by a classifier
An attribute evaluator similar to OneRAttributeEval, except that it uses a user-specified classifier to evaluate (either on the training data or by cross-validation) each attribute individually (weka.attributeSelection.ClassifierAttributeEval).
Cost/Benefit analysis component
A graphical tool for exploring various cost/benefit tradeoffs by interactively selecting different population sizes from the ranked list of prospects or by varying the threshold on the predicted probability of the positive class. More information can be seen on this Wiki page.