Skip to content

ODAC

The Online Divisive-Agglomerative Clustering (ODAC)1 aims at continuously maintaining a hierarchical cluster structure from evolving time series data streams.

ODAC continuosly monitors the evolution of clusters' diameters and split or merge them by gathering more data or reacting to concept drift. Such changes are supported by a confidence level that comes from the Hoeffding bound. ODAC relies on keeping the linear correlation between series to evaluate whether or not the time series hierarchy has changed.

The distance between time-series a and b is given by rnomc(a, b) = sqrt((1 - corr(a, b)) / 2), where corr(a, b) is the Pearson Correlation coefficient.

In the following topics, Îľ stands for the Hoeffding bound and considers clusters cj with descendants ck and cs.

The Merge Operator

The Splitting Criteria guarantees that cluster's diameters monotonically decrease.

  • If diameter (ck) - diameter (cj) > Îľ OR diameter (cs) - diameter (cj ) > Îľ:

    • There is a change in the correlation structure, so merge clusters ck and cs into cj.

Splitting Criteria

Consider:

  • d0: the minimum distance;

  • d1: the farthest distance;

  • d_avg: the average distance;

  • d2: the second farthest distance.

Then:

  • if d1 - d2 > Îľk or t > Îľk then

    • if (d1 - d0)|(d1 - d_avg) - (d_avg - d0) > Îľk then

      • Split the cluster

Parameters

  • confidence_level

    Type → float

    Default → 0.9

    The confidence level that user wants to work.

  • n_min

    Type → int

    Default → 100

    Number of minimum observations to gather before checking whether or not clusters must be split or merged.

  • tau

    Type → float

    Default → 0.1

    Threshold below which a split will be forced to break ties.

Attributes

  • structure_changed (bool)

    This variable is true when the structure changed, produced by splitting or aggregation.

Examples

from river import cluster
from river.datasets import synth

model = cluster.ODAC()

dataset = synth.FriedmanDrift(drift_type='gra', position=(150, 200), seed=42)

for i, (x, _) in enumerate(dataset.take(500)):
    model.learn_one(x)
    if model.structure_changed:
        print(f"Structure changed at observation {i + 1}")
Structure changed at observation 1
Structure changed at observation 100
Structure changed at observation 200
Structure changed at observation 300

print(model.draw(n_decimal_places=2))
ROOT d1=0.79 d2=0.76 [NOT ACTIVE]
├── CH1_LVL_1 d1=0.74 d2=0.72 [NOT ACTIVE]
│   ├── CH1_LVL_2 d1=<Not calculated> [3]
│   └── CH2_LVL_2 d1=0.73 [2, 4]
└── CH2_LVL_1 d1=0.81 d2=0.78 [NOT ACTIVE]
    ├── CH1_LVL_2 d1=0.73 d2=0.67 [NOT ACTIVE]
    │   ├── CH1_LVL_3 d1=0.72 [0, 9]
    │   └── CH2_LVL_3 d1=<Not calculated> [1]
    └── CH2_LVL_2 d1=0.74 d2=0.73 [NOT ACTIVE]
        ├── CH1_LVL_3 d1=0.71 [5, 6]
        └── CH2_LVL_3 d1=0.71 [7, 8]

You can acess some properties of the clustering model directly:

model.n_clusters
11

model.n_active_clusters
6

model.height
3

These properties are also available in a summarized form:

model.summary
{'n_clusters': 11, 'n_active_clusters': 6, 'height': 3}

Methods

draw

Method to draw the hierarchical cluster's structure.

Parameters

  • n_decimal_places — 'int' — defaults to 2

learn_one

Update the model with a set of features x.

Parameters

  • x — 'dict'

predict_one

This algorithm does not predict anything. It builds a hierarchical cluster's structure.

Parameters

  • x — 'dict'