Skip to content

CalinskiHarabasz

Calinski-Harabasz index (CH).

The Calinski-Harabasz index (CH) index measures the criteria simultaneously with the help of average between and within cluster sum of squares.

  • The numerator reflects the degree of separation in the way of how much centers are spread.

  • The denominator corresponds to compactness, to reflect how close the in-cluster objects are gathered around the cluster center.

Attributes

  • bigger_is_better

    Indicates if a high value is better than a low one or not.

Examples

>>> from river import cluster
>>> from river import stream
>>> from river import metrics

>>> X = [
...     [1, 2],
...     [1, 4],
...     [1, 0],
...     [4, 2],
...     [4, 4],
...     [4, 0],
...     [-2, 2],
...     [-2, 4],
...     [-2, 0]
... ]

>>> k_means = cluster.KMeans(n_clusters=3, halflife=0.4, sigma=3, seed=0)
>>> metric = metrics.cluster.CalinskiHarabasz()

>>> for x, _ in stream.iter_array(X):
...     k_means = k_means.learn_one(x)
...     y_pred = k_means.predict_one(x)
...     metric = metric.update(x, y_pred, k_means.centers)

>>> metric
CalinskiHarabasz: 6.922666

Methods

get

Return the current value of the metric.

revert

Revert the metric.

Parameters

  • x
  • y_pred
  • centers
  • sample_weight – defaults to 1.0
update

Update the metric.

Parameters

  • x
  • y_pred
  • centers
  • sample_weight – defaults to 1.0
works_with

Indicates whether or not a metric can work with a given model.

Parameters

  • model (river.base.estimator.Estimator)

References


  1. Calinski, T., Harabasz, J.-A. (1974). A Dendrite Method for Cluster Analysis. Communications in Statistics 3(1), 1 - 27. DOI: 10.1080/03610927408827101