Hartigan¶

Hartigan Index (H - Index)

Hartigan Index (H - Index) ¹ is a sum-of-square based index ², which is equal to the negative log of the division of SSW (Sum-of-Squares Within Clusters) by SSB (Sum-of-Squares Between Clusters).

The higher the Hartigan index, the higher the clustering quality is.

Attributes¶

bigger_is_better

Indicates if a high value is better than a low one or not.

Examples¶

>>> from river import cluster
>>> from river import stream
>>> from river import metrics

>>> X = [
...     [1, 2],
...     [1, 4],
...     [1, 0],
...     [4, 2],
...     [4, 4],
...     [4, 0],
...     [-2, 2],
...     [-2, 4],
...     [-2, 0]
... ]

>>> k_means = cluster.KMeans(n_clusters=3, halflife=0.4, sigma=3, seed=0)
>>> metric = metrics.cluster.Hartigan()

>>> for x, _ in stream.iter_array(X):
...     k_means = k_means.learn_one(x)
...     y_pred = k_means.predict_one(x)
...     metric = metric.update(x, y_pred, k_means.centers)

>>> metric
Hartigan: 0.836189

Methods¶

get

Return the current value of the metric.

revert

Revert the metric.

Parameters

x
y_pred
centers
sample_weight – defaults to 1.0

update

Update the metric.

Parameters

x
y_pred
centers
sample_weight – defaults to 1.0

works_with

Indicates whether or not a metric can work with a given model.

Parameters

model (river.base.estimator.Estimator)

References¶

Hartigan JA (1975). Clustering Algorithms. John Wiley & Sons, Inc., New York, NY, USA. ISBN 047135645X. ↩
Q. Zhao, M. Xu, and P. Franti, "Sum-of-squares based cluster validity index and significance analysis," in Adaptive and Natural Computing Algorithms, M. Kolehmainen, P. Toivanen, and B. Beliczynski, Eds. Berlin, Germany: Springer, 2009, pp. 313–322. ↩