MergeNominalValues

Package

weka.filters.supervised.attribute

Synopsis

Merges values of all nominal attributes among the specified attributes, excluding the class attribute, using the CHAID method, but without considering to re-split merged subsets. It implements Steps 1 and 2 described by Kass (1980), see

Gordon V. Kass (1980). An Exploratory Technique for Investigating Large Quantities of Categorical Data. Applied Statistics. 29(2):119-127.

Once attribute values have been merged, a chi-squared test using the Bonferroni correction is applied to check if the resulting attribute is a valid predictor, based on the Bonferroni multiplier in Equation 3.2 in Kass (1980). If an attribute does not pass this test, all remaining values (if any) are merged. Nevertheless, useless predictors can slip through without being fully merged, e.g. identifier attributes.

The code applies the Yates correction when the chi-squared statistic is computed.

Note that the algorithm is quadratic in the number of attribute values for an attribute.

Options

The table below describes the options available for MergeNominalValues.

Option

Description

attributeIndices

Specify range of attributes to act on (or its inverse). This is a comma separated list of attribute indices, with "first" and "last" valid values. Specify an inclusive range with "-". E.g: "first-3,5,6-10,last".

debug

Turns on output of debugging information.

invertSelection

Determines whether selected attributes are to be acted on or all other attributes are used instead.

significanceLevel

The significance level for the chi-squared test used to decide when to stop merging.

useShortIdentifiers

Whether to use short identifiers for the merged values.

Capabilities

The table below describes the capabilities of MergeNominalValues.

Capability

Supported

Class

Missing class values, Date class, Unary class, Nominal class, Numeric class, Binary class, Relational class, String class, Empty nominal class

Attributes

Relational attributes, String attributes, Binary attributes, Empty nominal attributes, Date attributes, Numeric attributes, Unary attributes, Nominal attributes, Missing values

Min # of instances

0