# KSWIN¶

Kolmogorov-Smirnov Windowing method for concept drift detection.

## Parameters¶

• alpha (float) – defaults to 0.005

Probability for the test statistic of the Kolmogorov-Smirnov-Test. The alpha parameter is very sensitive, therefore should be set below 0.01.

• window_size (int) – defaults to 100

Size of the sliding window.

• stat_size (int) – defaults to 30

Size of the statistic window.

• seed (int) – defaults to None

Random seed for reproducibility.

• window (Iterable) – defaults to None

Already collected data to avoid cold start.

## Attributes¶

• drift_detected

Concept drift alarm. True if concept drift is detected.

## Examples¶

>>> import random
>>> from river import drift

>>> rng = random.Random(12345)
>>> kswin = drift.KSWIN(alpha=0.0001, seed=42)

>>> # Simulate a data stream composed by two data distributions
>>> data_stream = rng.choices([0, 1], k=1000) + rng.choices(range(4, 8), k=1000)

>>> # Update drift detector and verify if change is detected
>>> for i, val in enumerate(data_stream):
...     _ = kswin.update(val)
...     if kswin.drift_detected:
...         print(f"Change detected at index {i}, input value: {val}")
Change detected at index 1016, input value: 6


## Methods¶

update

Update the change detector with a single data point.

Adds an element on top of the sliding window and removes the oldest one from the window. Afterwards, the KS-test is performed.

Parameters

• x (numbers.Number)

Returns

DriftDetector: self

## Notes¶

KSWIN (Kolmogorov-Smirnov Windowing) is a concept change detection method based on the Kolmogorov-Smirnov (KS) statistical test. KS-test is a statistical test with no assumption of underlying data distribution. KSWIN can monitor data or performance distributions. Note that the detector accepts one dimensional input as array.

KSWIN maintains a sliding window $$\Psi$$ of fixed size $$n$$ (window_size). The last $$r$$ (stat_size) samples of $$\Psi$$ are assumed to represent the last concept considered as $$R$$. From the first $$n-r$$ samples of $$\Psi$$, $$r$$ samples are uniformly drawn, representing an approximated last concept $$W$$.

The KS-test is performed on the windows $$R$$ and $$W$$ of the same size. KS -test compares the distance of the empirical cumulative data distribution $$dist(R,W)$$.

A concept drift is detected by KSWIN if:

$dist(R,W) > \sqrt{-\frac{ln\alpha}{r}}$

The difference in empirical data distributions between the windows $$R$$ and $$W$$ is too large since $$R$$ and $$W$$ come from the same distribution.

## References¶

1. Christoph Raab, Moritz Heusinger, Frank-Michael Schleif, Reactive Soft Prototype Computing for Concept Drift Streams, Neurocomputing, 2020,