SimpleKMeans

Package

weka.clusterers

Synopsis

Cluster data using the k means algorithm. Can use either the Euclidean distance (default) or the Manhattan distance. If the Manhattan distance is used, then centroids are computed as the component-wise median rather than mean. For more information see:

D. Arthur, S. Vassilvitskii: k-means++: the advantages of carefull seeding. In: Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms, 1027-1035, 2007.

Options

The table below describes the options available for SimpleKMeans.

Option	Description
displayStdDevs	Display std deviations of numeric attributes and counts of nominal attributes.
distanceFunction	The distance function to use for instances comparison (default: weka.core.EuclideanDistance).
dontReplaceMissingValues	Replace missing values globally with mean/mode.
fastDistanceCalc	Uses cut-off values for speeding up distance calculation, but suppresses also the calculation and output of the within cluster sum of squared errors/sum of distances.
initializeUsingKMeansPlusPlusMethod	Initialize cluster centers using the probabilistic farthest first method of the k-means++ algorithm
maxIterations	set maximum number of iterations
numClusters	set number of clusters
preserveInstancesOrder	Preserve order of instances.
seed	The random number seed to be used.

Capabilities

The table below describes the capabilities of SimpleKMeans.

Capability	Supported
Class	No class
Attributes	Nominal attributes, Numeric attributes, Missing values, Binary attributes, Empty nominal attributes, Unary attributes
Min # of instances	1