SRPClassifier¶

Streaming Random Patches ensemble classifier.

The Streaming Random Patches (SRP) ¹ is an ensemble method that simulates bagging or random subspaces. The default algorithm uses both bagging and random subspaces, namely Random Patches. The default base estimator is a Hoeffding Tree, but other base estimators can be used (differently from random forest variations).

Parameters¶

model (base.Classifier) – defaults to None

The base estimator.
n_models (int) – defaults to 100

Number of members in the ensemble.
subspace_size (Union[int, float, str]) – defaults to 0.6

Number of features per subset for each classifier where M is the total number of features.
A negative value means M - subspace_size.
Only applies when using random subspaces or random patches.
* If int indicates the number of features to use. Valid range [2, M].
* If float indicates the percentage of features to use, Valid range (0., 1.].
* 'sqrt' - sqrt(M)+1
* 'rmsqrt' - Residual from M-(sqrt(M)+1)
training_method (str) – defaults to patches

The training method to use.
* 'subspaces' - Random subspaces.
* 'resampling' - Resampling.
* 'patches' - Random patches.
lam (float) – defaults to 6.0

Lambda value for resampling.
drift_detector (base.DriftDetector) – defaults to None

Drift detector.
warning_detector (base.DriftDetector) – defaults to None

Warning detector.
disable_detector (str) – defaults to off

Option to disable drift detectors:
* If 'off', detectors are enabled.
* If 'drift', disables concept drift detection and the background learner.
* If 'warning', disables the background learner and ensemble members are reset if drift is detected.
disable_weighted_vote (bool) – defaults to False

If True, disables weighted voting.
nominal_attributes – defaults to None

List of Nominal attributes. If empty, then assumes that all attributes are numerical. Note: Only applies if the base model allows to define the nominal attributes.
seed – defaults to None

Random number generator seed for reproducibility.
metric (river.metrics.base.MultiClassMetric) – defaults to None

Metric to track members performance within the ensemble.

Examples¶

>>> from river import synth
>>> from river import ensemble
>>> from river import tree
>>> from river import evaluate
>>> from river import metrics

>>> dataset = synth.ConceptDriftStream(seed=42, position=500,
...                                    width=50).take(1000)
>>> base_model = tree.HoeffdingTreeClassifier(
...     grace_period=50, split_confidence=0.01,
...     nominal_attributes=['age', 'car', 'zipcode']
... )
>>> model = ensemble.SRPClassifier(
...     model=base_model, n_models=3, seed=42,
... )
>>> metric = metrics.Accuracy()

>>> evaluate.progressive_val_score(dataset, model, metric)  # doctest: +SKIP
Accuracy: 70.97%

Methods¶

clone

Return a fresh estimator with the same parameters.

The clone has the same parameters but has not been updated with any data. This works by looking at the parameters from the class signature. Each parameter is either - recursively cloned if it's a River classes. - deep-copied via copy.deepcopy if not. If the calling object is stochastic (i.e. it accepts a seed parameter) and has not been seeded, then the clone will not be idempotent. Indeed, this method's purpose if simply to return a new instance with the same input parameters.

learn_one

Update the model with a set of features x and a label y.

Parameters

x (dict)
y (Union[bool, str, int])
kwargs

Returns

self

predict_many

Predict the labels of a DataFrame X.

Parameters

X (pandas.core.frame.DataFrame)

Returns

Series: Series of predicted labels.

predict_one

Predict the label of a set of features x.

Parameters

x (dict)

Returns

typing.Union[bool, str, int]: The predicted label.

predict_proba_many

Predict the labels of a DataFrame X.

Parameters

X (pandas.core.frame.DataFrame)

Returns

DataFrame: DataFrame that associate probabilities which each label as columns.

predict_proba_one

Predict the probability of each label for a dictionary of features x.

Parameters

x

Returns

A dictionary that associates a probability which each label.

reset

References¶

Heitor Murilo Gomes, Jesse Read, Albert Bifet. Streaming Random Patches for Evolving Data Stream Classification. IEEE International Conference on Data Mining (ICDM), 2019. ↩