Skip to content

KSWIN

Kolmogorov-Smirnov Windowing method for concept drift detection.

Parameters

  • alpha (float) – defaults to 0.005

    Probability for the test statistic of the Kolmogorov-Smirnov-Test. The alpha parameter is very sensitive, therefore should be set below 0.01.

  • window_size (int) – defaults to 100

    Size of the sliding window.

  • stat_size (int) – defaults to 30

    Size of the statistic window.

  • seed (int) – defaults to None

    Random seed for reproducibility.

  • window (Iterable) – defaults to None

    Already collected data to avoid cold start.

Attributes

  • change_detected

    Concept drift alarm. True if concept drift is detected.

  • warning_detected

    Warning zone alarm. Indicates if the drift detector is in the warning zone. Applicability depends on each drift detector implementation. True if the change detector is in the warning zone.

Examples

>>> import random
>>> from river import drift

>>> rng = random.Random(12345)
>>> kswin = drift.KSWIN(alpha=0.0001, seed=42)

>>> # Simulate a data stream composed by two data distributions
>>> data_stream = rng.choices([0, 1], k=1000) + rng.choices(range(4, 8), k=1000)

>>> # Update drift detector and verify if change is detected
>>> for i, val in enumerate(data_stream):
...     in_drift, _ = kswin.update(val)
...     if in_drift:
...         print(f"Change detected at index {i}, input value: {val}")
...         kswin.reset()  # Good practice
Change detected at index 1016, input value: 6

Methods

clone

Return a fresh estimator with the same parameters.

The clone has the same parameters but has not been updated with any data. This works by looking at the parameters from the class signature. Each parameter is either - recursively cloned if it's a River classes. - deep-copied via copy.deepcopy if not. If the calling object is stochastic (i.e. it accepts a seed parameter) and has not been seeded, then the clone will not be idempotent. Indeed, this method's purpose if simply to return a new instance with the same input parameters.

reset

Reset the change detector.

update

Update the change detector with a single data point.

Adds an element on top of the sliding window and removes the oldest one from the window. Afterwards, the KS-test is performed.

Parameters

  • value (numbers.Number)

Returns

typing.Tuple[bool, bool]: A tuple (drift, warning) where its elements indicate if a drift or a warning is detected.

Notes

KSWIN (Kolmogorov-Smirnov Windowing) is a concept change detection method based on the Kolmogorov-Smirnov (KS) statistical test. KS-test is a statistical test with no assumption of underlying data distribution. KSWIN can monitor data or performance distributions. Note that the detector accepts one dimensional input as array.

KSWIN maintains a sliding window \(\Psi\) of fixed size \(n\) (window_size). The last \(r\) (stat_size) samples of \(\Psi\) are assumed to represent the last concept considered as \(R\). From the first \(n-r\) samples of \(\Psi\), \(r\) samples are uniformly drawn, representing an approximated last concept \(W\).

The KS-test is performed on the windows \(R\) and \(W\) of the same size. KS -test compares the distance of the empirical cumulative data distribution \(dist(R,W)\).

A concept drift is detected by KSWIN if:

\[ dist(R,W) > \sqrt{-\frac{ln\alpha}{r}} \]

The difference in empirical data distributions between the windows \(R\) and \(W\) is too large since \(R\) and \(W\) come from the same distribution.

References


  1. Christoph Raab, Moritz Heusinger, Frank-Michael Schleif, Reactive Soft Prototype Computing for Concept Drift Streams, Neurocomputing, 2020,