Skip to content

SRPClassifier

Streaming Random Patches ensemble classifier.

The Streaming Random Patches (SRP) 1 is an ensemble method that simulates bagging or random subspaces. The default algorithm uses both bagging and random subspaces, namely Random Patches. The default base estimator is a Hoeffding Tree, but other base estimators can be used (differently from random forest variations).

Parameters

  • model (base.Estimator) – defaults to None

    The base estimator.

  • n_models (int) – defaults to 10

    Number of members in the ensemble.

  • subspace_size (Union[int, float, str]) – defaults to 0.6

    Number of features per subset for each classifier where M is the total number of features.
    A negative value means M - subspace_size.
    Only applies when using random subspaces or random patches.
    * If int indicates the number of features to use. Valid range [2, M].
    * If float indicates the percentage of features to use, Valid range (0., 1.].
    * 'sqrt' - sqrt(M)+1
    * 'rmsqrt' - Residual from M-(sqrt(M)+1)

  • training_method (str) – defaults to patches

    The training method to use.
    * 'subspaces' - Random subspaces.
    * 'resampling' - Resampling.
    * 'patches' - Random patches.

  • lam (int) – defaults to 6

    Lambda value for resampling.

  • drift_detector (base.DriftDetector) – defaults to None

    Drift detector.

  • warning_detector (base.DriftDetector) – defaults to None

    Warning detector.

  • disable_detector (str) – defaults to off

    Option to disable drift detectors:
    * If 'off', detectors are enabled.
    * If 'drift', disables concept drift detection and the background learner.
    * If 'warning', disables the background learner and ensemble members are reset if drift is detected.

  • disable_weighted_vote (bool) – defaults to False

    If True, disables weighted voting.

  • seed (int) – defaults to None

    Random number generator seed for reproducibility.

  • metric (Optional[river.metrics.base.ClassificationMetric]) – defaults to None

    The metric to track members performance within the ensemble. This implementation assumes that larger values are better when using weighted votes.

Attributes

  • models

Examples

>>> from river import ensemble
>>> from river import evaluate
>>> from river import metrics
>>> from river.datasets import synth
>>> from river import tree

>>> dataset = synth.ConceptDriftStream(
...     seed=42,
...     position=500,
...     width=50
... ).take(1000)

>>> base_model = tree.HoeffdingTreeClassifier(
...     grace_period=50, delta=0.01,
...     nominal_attributes=['age', 'car', 'zipcode']
... )
>>> model = ensemble.SRPClassifier(
...     model=base_model, n_models=3, seed=42,
... )

>>> metric = metrics.Accuracy()

>>> evaluate.progressive_val_score(dataset, model, metric)
Accuracy: 72.77%

Methods

append

S.append(value) -- append value to the end of the sequence

Parameters

  • item
clear

S.clear() -> None -- remove all items from S

copy
count

S.count(value) -> integer -- return number of occurrences of value

Parameters

  • item
extend

S.extend(iterable) -- extend sequence by appending elements from the iterable

Parameters

  • other
index

S.index(value, [start, [stop]]) -> integer -- return first index of value. Raises ValueError if the value is not present.

Supporting start and stop arguments is optional, but recommended.

Parameters

  • item
  • args
insert

S.insert(index, value) -- insert value before index

Parameters

  • i
  • item
learn_one
pop

S.pop([index]) -> item -- remove and return item at index (default last). Raise IndexError if list is empty or index is out of range.

Parameters

  • i – defaults to -1
predict_one

Predict the label of a set of features x.

Parameters

  • x (dict)
  • kwargs

Returns

typing.Union[bool, str, int, NoneType]: The predicted label.

predict_proba_one

Predict the probability of each label for a dictionary of features x.

Parameters

  • x
  • kwargs

Returns

A dictionary that associates a probability which each label.

remove

S.remove(value) -- remove first occurrence of value. Raise ValueError if the value is not present.

Parameters

  • item
reset
reverse

S.reverse() -- reverse IN PLACE

sort

Notes

This implementation uses n_models=10 as default given the impact on processing time. The optimal number of models depends on the data and resources available.

References


  1. Heitor Murilo Gomes, Jesse Read, Albert Bifet. Streaming Random Patches for Evolving Data Stream Classification. IEEE International Conference on Data Mining (ICDM), 2019.