Package
weka.clusterers
Synopsis
Cluster data using the k means algorithm. Can use either the Euclidean distance (default) or the Manhattan distance. If the Manhattan distance is used, then centroids are computed as the component-wise median rather than mean. For more information see:
D. Arthur, S. Vassilvitskii: k-means++: the advantages of carefull seeding. In: Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms, 1027-1035, 2007.
Options
The table below describes the options available for SimpleKMeans.
Option | Description |
---|---|
displayStdDevs | Display std deviations of numeric attributes and counts of nominal attributes. |
distanceFunction | The distance function to use for instances comparison (default: weka.core.EuclideanDistance). |
dontReplaceMissingValues | Replace missing values globally with mean/mode. |
fastDistanceCalc | Uses cut-off values for speeding up distance calculation, but suppresses also the calculation and output of the within cluster sum of squared errors/sum of distances. |
initializeUsingKMeansPlusPlusMethod | Initialize cluster centers using the probabilistic farthest first method of the k-means++ algorithm |
maxIterations | set maximum number of iterations |
numClusters | set number of clusters |
preserveInstancesOrder | Preserve order of instances. |
seed | The random number seed to be used. |
Capabilities
The table below describes the capabilities of SimpleKMeans.
Capability | Supported |
---|---|
Class | No class |
Attributes | Nominal attributes, Numeric attributes, Missing values, Binary attributes, Empty nominal attributes, Unary attributes |
Min # of instances | 1 |