SD¶
The SD validity index (SD).
The SD validity index (SD) 1 is a more recent clustering validation measure. It is composed of two terms:
-
Scat(NC) stands for the scattering within clusters,
-
Dis(NC) stands for the dispersion between clusters.
Like DB and SB, SD measures the compactness with variance of clustered objects and separation with distance between cluster centers, but uses them in a different way. The smaller the value of SD, the better.
In the original formula for SD validation index, the ratio between the maximum and the actual number of clusters is taken into account. However, due to the fact that metrics are updated in an incremental fashion, this ratio will be automatically set to default as 1.
Attributes¶
-
bigger_is_better
Indicates if a high value is better than a low one or not.
Examples¶
>>> from river import cluster
>>> from river import stream
>>> from river import metrics
>>> X = [
... [1, 2],
... [1, 4],
... [1, 0],
... [4, 2],
... [4, 4],
... [4, 0],
... [-2, 2],
... [-2, 4],
... [-2, 0]
... ]
>>> k_means = cluster.KMeans(n_clusters=3, halflife=0.4, sigma=3, seed=0)
>>> metric = metrics.cluster.SD()
>>> for x, _ in stream.iter_array(X):
... k_means = k_means.learn_one(x)
... y_pred = k_means.predict_one(x)
... metric = metric.update(x, y_pred, k_means.centers)
>>> metric
SD: 2.339016
Methods¶
get
Return the current value of the metric.
revert
Revert the metric.
Parameters
- x
- y_pred
- centers
- sample_weight – defaults to
1.0
update
Update the metric.
Parameters
- x
- y_pred
- centers
- sample_weight – defaults to
1.0
works_with
Indicates whether or not a metric can work with a given model.
Parameters
- model (river.base.estimator.Estimator)
References¶
-
Halkidi, M., Vazirgiannis, M., & Batistakis, Y. (2000). Quality Scheme Assessment in the Clustering Process. Principles Of Data Mining And Knowledge Discovery, 265-276. DOI: 10.1007/3-540-45372-5_26 ↩