# Cluster Algorithms

From CheS-Mapper Wiki

## Overview

Algorithm | Cluster approach | Num Clusters Variable | Various Distance Functions | Deterministic | Random restarts | Independent of R |
---|---|---|---|---|---|---|

SimpleKMeans (WEKA) | Centroid | Yes | Yes | |||

k-Means - Cascade (WEKA) | Centroid | Yes | Yes | Yes | Yes | |

FarthestFirst (WEKA) | Centroid | Yes | ||||

Expectation Maximization (WEKA) | Distribution | Yes* | Yes | |||

Cobweb (WEKA) | Connectivity | Yes | Yes | |||

Hierarchical (WEKA) | Connectivity | Yes | Yes | Yes | ||

k-Means (R) | Centroid | Yes | ||||

k-Means - Cascade (R) | Centroid | Yes | Yes | |||

Hierarchical (R) | Connectivity | Yes | ||||

Hierarchical - Dynamic Tree Cut (R) | Connectivity | Yes | Yes |

- Num Clusters Variable
- No entry indicates that the number of clusters has to be set to a fixed number.
***The EM algorithm can perform an internal cross-validation to detect the number of clusters, which is quite time-consuming, see Algorithm Runtimes. - Various Distance Functions
- The user can select from different distance functions (Euclidean, Manhatten, Chebyshev). No entry: only Euclidean distance available.
- Deterministic
- The algorithm contains no random element, the result is equal for each run of the algorithm. No entry: the user can specify a Random seed value to initialize the algorithm.
- Random restarts
- Only available for non-deterministic algorithms: the algorithm performs random restarts and automatically selects the best random result.
- Independent of R
- For some algorithms the R statistical software has to be installed on your machine.