Cluster Algorithms

From CheS-Mapper Wiki
Jump to: navigation, search

Overview

Algorithm Cluster approach Num Clusters Variable Various Distance Functions Deterministic Random restarts Independent of R
SimpleKMeans (WEKA) Centroid Yes Yes
k-Means - Cascade (WEKA) Centroid Yes Yes Yes Yes
FarthestFirst (WEKA) Centroid Yes
Expectation Maximization (WEKA) Distribution Yes* Yes
Cobweb (WEKA) Connectivity Yes Yes
Hierarchical (WEKA) Connectivity Yes Yes Yes
k-Means (R) Centroid Yes
k-Means - Cascade (R) Centroid Yes Yes
Hierarchical (R) Connectivity Yes
Hierarchical - Dynamic Tree Cut (R) Connectivity Yes Yes


Num Clusters Variable
No entry indicates that the number of clusters has to be set to a fixed number. * The EM algorithm can perform an internal cross-validation to detect the number of clusters, which is quite time-consuming, see Algorithm Runtimes.
Various Distance Functions
The user can select from different distance functions (Euclidean, Manhatten, Chebyshev). No entry: only Euclidean distance available.
Deterministic
The algorithm contains no random element, the result is equal for each run of the algorithm. No entry: the user can specify a Random seed value to initialize the algorithm.
Random restarts
Only available for non-deterministic algorithms: the algorithm performs random restarts and automatically selects the best random result.
Independent of R
For some algorithms the R statistical software has to be installed on your machine.