DDM¶
Drift Detection Method.
DDM (Drift Detection Method) is a concept change detection method based on the PAC learning model premise, that the learner's error rate will decrease as the number of analysed samples increase, as long as the data distribution is stationary.
If the algorithm detects an increase in the error rate, that surpasses a calculated threshold, either change is detected or the algorithm will warn the user that change may occur in the near future, which is called the warning zone.
The detection threshold is calculated in function of two statistics, obtained when \((p_i + s_i)\) is minimum:
-
\(p_{min}\): The minimum recorded error rate.
-
\(s_{min}\): The minimum recorded standard deviation.
At instant \(i\), the detection algorithm uses:
-
\(p_i\): The error rate at instant \(i\).
-
\(s_i\): The standard deviation at instant \(i\).
The conditions for entering the warning zone and detecting change are as follows [see implementation note below]:
-
if \(p_i + s_i \geq p_{min} + w_l * s_{min}\) -> Warning zone
-
if \(p_i + s_i \geq p_{min} + d_l * s_{min}\) -> Change detected
In the above expressions, \(w_l\) and \(d_l\) represent, respectively, the warning and drift thresholds.
Input: x
is an entry in a stream of bits, where 1 indicates error/failure and 0 represents correct/normal values.
For example, if a classifier's prediction \(y'\) is right or wrong w.r.t. the true target label \(y\):
-
0: Correct, \(y=y'\)
-
1: Error, \(y \neq y'\)
Parameters¶
-
warm_start (int) – defaults to
30
The minimum required number of analyzed samples so change can be detected. Warm start parameter for the drift detector.
-
warning_threshold (float) – defaults to
2.0
Threshold to decide if the detector is in a warning zone. The default value gives 95\% of confidence level to the warning assessment.
-
drift_threshold (float) – defaults to
3.0
Threshold to decide if a drift was detected. The default value gives a 99\% of confidence level to the drift assessment.
Attributes¶
-
drift_detected
Whether or not a drift is detected following the last update.
-
warning_detected
Whether or not a drift is detected following the last update.
Examples¶
>>> import random
>>> from river import drift
>>> rng = random.Random(42)
>>> ddm = drift.binary.DDM()
>>> # Simulate a data stream where the first 1000 instances come from a uniform distribution
>>> # of 1's and 0's
>>> data_stream = rng.choices([0, 1], k=1000)
>>> # Increase the probability of 1's appearing in the next 1000 instances
>>> data_stream = data_stream + rng.choices([0, 1], k=1000, weights=[0.3, 0.7])
>>> print_warning = True
>>> # Update drift detector and verify if change is detected
>>> for i, x in enumerate(data_stream):
... _ = ddm.update(x)
... if ddm.warning_detected and print_warning:
... print(f"Warning detected at index {i}")
... print_warning = False
... if ddm.drift_detected:
... print(f"Change detected at index {i}")
... print_warning = True
Warning detected at index 1084
Change detected at index 1334
Warning detected at index 1492
Methods¶
update
Update the detector with a single boolean input.
Parameters
- x (bool)
Returns
BinaryDriftDetector: self
References¶
-
João Gama, Pedro Medas, Gladys Castillo, Pedro Pereira Rodrigues: Learning with Drift Detection. SBIA 2004: 286-295 ↩