Skip to content

DDM

Drift Detection Method.

DDM (Drift Detection Method) is a concept change detection method based on the PAC learning model premise, that the learner's error rate will decrease as the number of analysed samples increase, as long as the data distribution is stationary.

If the algorithm detects an increase in the error rate, that surpasses a calculated threshold, either change is detected or the algorithm will warn the user that change may occur in the near future, which is called the warning zone.

The detection threshold is calculated in function of two statistics, obtained when \((p_i + s_i)\) is minimum:

  • \(p_{min}\): The minimum recorded error rate.

  • \(s_{min}\): The minimum recorded standard deviation.

At instant \(i\), the detection algorithm uses:

  • \(p_i\): The error rate at instant \(i\).

  • \(s_i\): The standard deviation at instant \(i\).

The conditions for entering the warning zone and detecting change are as follows [see implementation note bellow]:

  • if \(p_i + s_i \geq p_{min} + 2 * s_{min}\) -> Warning zone

  • if \(p_i + s_i \geq p_{min} + 3 * s_{min}\) -> Change detected

Input: value must be a binary signal, where 1 indicates error. For example, if a classifier's prediction \(y'\) is right or wrong w.r.t the true target label \(y\):

  • 0: Correct, \(y=y'\)

  • 1: Error, \(y \neq y'\)

Parameters

  • min_num_instances – defaults to 30

    The minimum required number of analyzed samples so change can be detected. This is used to avoid false detections during the early moments of the detector, when the weight of one sample is important.

  • warning_level – defaults to 2.0

    Warning level.

  • out_control_level – defaults to 3.0

    Out-control level.

Attributes

  • change_detected

    Concept drift alarm. True if concept drift is detected.

  • warning_detected

    Warning zone alarm. Indicates if the drift detector is in the warning zone. Applicability depends on each drift detector implementation. True if the change detector is in the warning zone.

Examples

>>> import random
>>> from river import drift

>>> rng = random.Random(42)
>>> ddm = drift.DDM()

>>> # Simulate a data stream as a uniform distribution of 1's and 0's
>>> data_stream = rng.choices([0, 1], k=2000)
>>> # Change the data distribution from index 999 to 1500, simulating an
>>> # increase in error rate (1 indicates error)
>>> data_stream[999:1500] = [1] * 500

>>> # Update drift detector and verify if change is detected
>>> for i, val in enumerate(data_stream):
...     in_drift, in_warning = ddm.update(val)
...     if in_drift:
...         print(f"Change detected at index {i}, input value: {val}")
Change detected at index 1101, input value: 1

Methods

clone

Return a fresh estimator with the same parameters.

The clone has the same parameters but has not been updated with any data. This works by looking at the parameters from the class signature. Each parameter is either - recursively cloned if it's a River classes. - deep-copied via copy.deepcopy if not. If the calling object is stochastic (i.e. it accepts a seed parameter) and has not been seeded, then the clone will not be idempotent. Indeed, this method's purpose if simply to return a new instance with the same input parameters.

reset

Reset the change detector.

update

Update the change detector with a single data point.

Parameters

  • value (numbers.Number)

Notes

In this implementation, the conditions to signal drift and warning are \(p_i + s_i > thershold\) instead of \(p_i + s_i \geq thershold\). This is to avoid a corner case when a classifier is consistently wrong (value=1) that results in DDM indicating a drift every min_num_instances. This modification is consistent with the implementation in MOA.

References


  1. João Gama, Pedro Medas, Gladys Castillo, Pedro Pereira Rodrigues: Learning with Drift Detection. SBIA 2004: 286-295