ODAC¶
The Online Divisive-Agglomerative Clustering (ODAC)1 aims at continuously maintaining a hierarchical cluster structure from evolving time series data streams.
The distance between time-series a and b is given by rnomc(a, b) = sqrt((1 - corr(a, b)) / 2), where corr(a, b) is the Pearson Correlation coefficient. If the cluster has only one time-series, the diameter is given by the time-series variance. The cluster's diameter is given by the largest distance between the cluster's time-series.
ODAC continuously monitors the evolution of diameters, only of the leaves, and splits or merges them by gathering more data or reacting to concept drift - a confidence level from the Hoeffding bound supports such changes.
So, the split operator, where the Hoeffding bound is applied, occurs when the difference between the largest distance (diameter) and the second largest difference is greater than a constant. Furthermore, the merge operator checks if one of the cluster's children has a diameter bigger than their parent - applying the Hoeffding bound again.
Parameters¶
-
confidence_level
Type →
floatDefault →
0.9The confidence level that user wants to work.
-
n_min
Type →
intDefault →
100Number of minimum observations to gather before checking whether or not clusters must be split or merged.
-
tau
Type →
floatDefault →
0.1Threshold below which a split will be forced to break ties.
Attributes¶
-
structure_changed (
bool)This variable is true when the structure changed, produced by splitting or aggregation.
Examples¶
from river import cluster
from river.datasets import synth
model = cluster.ODAC()
dataset = synth.FriedmanDrift(drift_type='gra', position=(150, 200), seed=42)
for i, (x, _) in enumerate(dataset.take(500)):
model.learn_one(x)
if model.structure_changed:
print(f"Structure changed at observation {i + 1}")
Structure changed at observation 1
Structure changed at observation 100
Structure changed at observation 200
Structure changed at observation 300
print(model.render_ascii())
ROOT d1=0.79 d2=0.76 [NOT ACTIVE]
├── CH1_LVL_1 d1=0.74 d2=0.72 [NOT ACTIVE]
│ ├── CH1_LVL_2 d1=0.08 [3]
│ └── CH2_LVL_2 d1=0.73 [2, 4]
└── CH2_LVL_1 d1=0.81 d2=0.78 [NOT ACTIVE]
├── CH1_LVL_2 d1=0.73 d2=0.67 [NOT ACTIVE]
│ ├── CH1_LVL_3 d1=0.72 [0, 9]
│ └── CH2_LVL_3 d1=0.08 [1]
└── CH2_LVL_2 d1=0.74 d2=0.73 [NOT ACTIVE]
├── CH1_LVL_3 d1=0.71 [5, 6]
└── CH2_LVL_3 d1=0.71 [7, 8]
You can access some properties of the clustering model directly:
model.n_clusters
11
model.n_active_clusters
6
model.height
3
These properties are also available in a summarized form:
model.summary
{'n_clusters': 11, 'n_active_clusters': 6, 'height': 3}
Methods¶
draw
Method to draw the hierarchical cluster's structure as a Graphviz graph.
Parameters
- max_depth —
int | None— defaults toNone - show_clusters_info —
list[typing.Hashable]— defaults to['timeseries_names', 'd1', 'd2', 'e'] - n_decimal_places —
int— defaults to2
learn_one
Update the model with a set of features x.
Parameters
- x —
dict
predict_one
This algorithm does not predict anything. It builds a hierarchical cluster's structure.
Parameters
- x —
dict
render_ascii
Method to render the hierarchical cluster's structure in text format.
Parameters
- n_decimal_places —
int— defaults to2
-
P. P. Rodrigues, J. Gama and J. Pedroso, "Hierarchical Clustering of Time-Series Data Streams" in IEEE Transactions on Knowledge and Data Engineering, vol. 20, no. 5, pp. 615-627, May 2008, doi: 10.1109/TKDE.2007.190727. ↩