BIC¶
Bayesian Information Criterion (BIC).
In statistics, the Bayesian Information Criterion (BIC) 1, or Schwarz Information Criterion (SIC), is a criterion for model selection among a finite set of models; the model with the highest BIC is preferred. It is based, in part, on the likelihood function and is closely related to the Akaike Information Criterion (AIC).
Let
-
k being the number of clusters,
-
\(n_i\) being the number of points within each cluster, \(n_1 + n_2 + ... + n_k = n\),
-
\(d\) being the dimension of the clustering problem.
Then, the variance of the clustering solution will be calculated as
The maximum likelihood function, used in the BIC version of River
, would be
and the BIC will then be calculated as
Using the previously mentioned maximum likelihood function, the higher the BIC value, the better the clustering solution is. Moreover, the BIC calculated will always be less than 0 2.
Attributes¶
-
bigger_is_better
Indicates if a high value is better than a low one or not.
Examples¶
>>> from river import cluster
>>> from river import stream
>>> from river import metrics
>>> X = [
... [1, 2],
... [1, 4],
... [1, 0],
... [4, 2],
... [4, 4],
... [4, 0],
... [-2, 2],
... [-2, 4],
... [-2, 0]
... ]
>>> k_means = cluster.KMeans(n_clusters=3, halflife=0.4, sigma=3, seed=0)
>>> metric = metrics.cluster.BIC()
>>> for x, _ in stream.iter_array(X):
... k_means = k_means.learn_one(x)
... y_pred = k_means.predict_one(x)
... metric = metric.update(x, y_pred, k_means.centers)
>>> metric
BIC: -30.060416
Methods¶
get
Return the current value of the metric.
revert
Revert the metric.
Parameters
- x
- y_pred
- centers
- sample_weight – defaults to
1.0
update
Update the metric.
Parameters
- x
- y_pred
- centers
- sample_weight – defaults to
1.0
works_with
Indicates whether or not a metric can work with a given model.
Parameters
- model (river.base.estimator.Estimator)