WB¶

WB Index

WB Index is a simple sum-of-square method, calculated by dividing the within cluster sum-of-squares by the between cluster sum-of-squares. Its effect is emphasized by multiplying the number of clusters. The advantages of the proposed method are that one can determine the number of clusters by minimizing the WB value, without relying on any knee point detection, and this metric is straightforward to implement.

The lower the WB index, the higher the clustering quality is.

Attributes¶

bigger_is_better

Indicates if a high value is better than a low one or not.

Examples¶

>>> from river import cluster
>>> from river import stream
>>> from river import metrics

>>> X = [
...     [1, 2],
...     [1, 4],
...     [1, 0],
...     [4, 2],
...     [4, 4],
...     [4, 0],
...     [-2, 2],
...     [-2, 4],
...     [-2, 0]
... ]

>>> k_means = cluster.KMeans(n_clusters=3, halflife=0.4, sigma=3, seed=0)
>>> metric = metrics.cluster.WB()

>>> for x, _ in stream.iter_array(X):
...     k_means = k_means.learn_one(x)
...     y_pred = k_means.predict_one(x)
...     metric = metric.update(x, y_pred, k_means.centers)

>>> metric
WB: 1.300077

Methods¶

get

Return the current value of the metric.

revert

Revert the metric.

Parameters

x
y_pred
centers
sample_weight – defaults to 1.0

update

Update the metric.

Parameters

x
y_pred
centers
sample_weight – defaults to 1.0

works_with

Indicates whether or not a metric can work with a given model.

Parameters

model (river.base.estimator.Estimator)

References¶

Q. Zhao, M. Xu, and P. Franti, "Sum-of-squares based cluster validity index and significance analysis," in Adaptive and Natural Computing Algorithms, M. Kolehmainen, P. Toivanen, and B. Beliczynski, Eds. Berlin, Germany: Springer, 2009, pp. 313–322. ↩