ODAC¶
The Online Divisive-Agglomerative Clustering (ODAC)1 aims at continuously maintaining a hierarchical cluster structure from evolving time series data streams.
ODAC continuosly monitors the evolution of clusters' diameters and split or merge them by gathering more data or reacting to concept drift. Such changes are supported by a confidence level that comes from the Hoeffding bound. ODAC relies on keeping the linear correlation between series to evaluate whether or not the time series hierarchy has changed.
The distance between time-series a and b is given by rnomc(a, b) = sqrt((1 - corr(a, b)) / 2)
, where corr(a, b)
is the Pearson Correlation coefficient.
In the following topics, Îľ stands for the Hoeffding bound and considers clusters cj with descendants ck and cs.
The Merge Operator
The Splitting Criteria guarantees that cluster's diameters monotonically decrease.
-
If diameter (ck) - diameter (cj) > Îľ OR diameter (cs) - diameter (cj ) > Îľ:
- There is a change in the correlation structure, so merge clusters ck and cs into cj.
Splitting Criteria
Consider:
-
d0: the minimum distance;
-
d1: the farthest distance;
-
d_avg: the average distance;
-
d2: the second farthest distance.
Then:
-
if d1 - d2 > Îľk or t > Îľk then
-
if (d1 - d0)|(d1 - d_avg) - (d_avg - d0) > Îľk then
- Split the cluster
-
Parameters¶
-
confidence_level
Type â float
Default â
0.9
The confidence level that user wants to work.
-
n_min
Type â int
Default â
100
Number of minimum observations to gather before checking whether or not clusters must be split or merged.
-
tau
Type â float
Default â
0.1
Threshold below which a split will be forced to break ties.
Attributes¶
-
structure_changed (bool)
This variable is true when the structure changed, produced by splitting or aggregation.
Examples¶
from river import cluster
from river.datasets import synth
model = cluster.ODAC()
dataset = synth.FriedmanDrift(drift_type='gra', position=(150, 200), seed=42)
for i, (x, _) in enumerate(dataset.take(500)):
model.learn_one(x)
if model.structure_changed:
print(f"Structure changed at observation {i + 1}")
Structure changed at observation 1
Structure changed at observation 100
Structure changed at observation 200
Structure changed at observation 300
print(model.draw(n_decimal_places=2))
ROOT d1=0.79 d2=0.76 [NOT ACTIVE]
âââ CH1_LVL_1 d1=0.74 d2=0.72 [NOT ACTIVE]
â âââ CH1_LVL_2 d1=<Not calculated> [3]
â âââ CH2_LVL_2 d1=0.73 [2, 4]
âââ CH2_LVL_1 d1=0.81 d2=0.78 [NOT ACTIVE]
âââ CH1_LVL_2 d1=0.73 d2=0.67 [NOT ACTIVE]
â âââ CH1_LVL_3 d1=0.72 [0, 9]
â âââ CH2_LVL_3 d1=<Not calculated> [1]
âââ CH2_LVL_2 d1=0.74 d2=0.73 [NOT ACTIVE]
âââ CH1_LVL_3 d1=0.71 [5, 6]
âââ CH2_LVL_3 d1=0.71 [7, 8]
You can acess some properties of the clustering model directly:
model.n_clusters
11
model.n_active_clusters
6
model.height
3
These properties are also available in a summarized form:
model.summary
{'n_clusters': 11, 'n_active_clusters': 6, 'height': 3}
Methods¶
draw
Method to draw the hierarchical cluster's structure.
Parameters
- n_decimal_places â 'int' â defaults to
2
learn_one
Update the model with a set of features x
.
Parameters
- x â 'dict'
predict_one
This algorithm does not predict anything. It builds a hierarchical cluster's structure.
Parameters
- x â 'dict'