SRPRegressor¶

Streaming Random Patches ensemble regressor.

The Streaming Random Patches ¹ ensemble method for regression trains each base learner on a subset of features and instances from the original data, namely a random patch. This strategy to enforce diverse base models is similar to the one in the random forest, yet it is not restricted to using decision trees as base learner.

This method is an adaptation of ² for regression.

Parameters¶

model (base.Regressor) – defaults to None

The base estimator.
n_models (int) – defaults to 10

Number of members in the ensemble.
subspace_size (Union[int, float, str]) – defaults to 0.6

Number of features per subset for each classifier where M is the total number of features.
A negative value means M - subspace_size.
Only applies when using random subspaces or random patches.
* If int indicates the number of features to use. Valid range [2, M].
* If float indicates the percentage of features to use, Valid range (0., 1.].
* 'sqrt' - sqrt(M)+1
* 'rmsqrt' - Residual from M-(sqrt(M)+1)
training_method (str) – defaults to patches

The training method to use.
* 'subspaces' - Random subspaces.
* 'resampling' - Resampling.
* 'patches' - Random patches.
lam (int) – defaults to 6

Lambda value for bagging.
drift_detector (base.DriftDetector) – defaults to None

Drift detector.
warning_detector (base.DriftDetector) – defaults to None

Warning detector.
disable_detector (str) – defaults to off

Option to disable drift detectors:
* If 'off', detectors are enabled.
* If 'drift', disables concept drift detection and the background learner.
* If 'warning', disables the background learner and ensemble members are reset if drift is detected.
disable_weighted_vote (bool) – defaults to True

If True, disables weighted voting.
drift_detection_criteria (str) – defaults to error

The criteria used to track drifts.
* 'error' - absolute error.
* 'prediction' - predicted target values.
aggregation_method (str) – defaults to mean

The method to use to aggregate predictions in the ensemble.
* 'mean'
* 'median'
seed – defaults to None

Random number generator seed for reproducibility.
metric (Union[river.metrics.base.RegressionMetric, NoneType]) – defaults to None

The metric to track members performance within the ensemble.

Attributes¶

models

Examples¶

>>> from river import ensemble
>>> from river import evaluate
>>> from river import metrics
>>> from river import synth
>>> from river import tree

>>> dataset = synth.FriedmanDrift(
...     drift_type='gsg',
...     position=(350, 750),
...     transition_window=200,
...     seed=42
... ).take(1000)

>>> base_model = tree.HoeffdingTreeRegressor(grace_period=50)
>>> model = ensemble.SRPRegressor(
...     model=base_model,
...     training_method="patches",
...     n_models=3,
...     seed=42
... )

>>> metric = metrics.R2()

>>> evaluate.progressive_val_score(dataset, model, metric)
R2: 0.571263

Methods¶

append

S.append(value) -- append value to the end of the sequence

Parameters

item

clear

S.clear() -> None -- remove all items from S

clone

Return a fresh estimator with the same parameters.

The clone has the same parameters but has not been updated with any data. This works by looking at the parameters from the class signature. Each parameter is either - recursively cloned if it's a River classes. - deep-copied via copy.deepcopy if not. If the calling object is stochastic (i.e. it accepts a seed parameter) and has not been seeded, then the clone will not be idempotent. Indeed, this method's purpose if simply to return a new instance with the same input parameters.

copy

count

S.count(value) -> integer -- return number of occurrences of value

Parameters

item

extend

S.extend(iterable) -- extend sequence by appending elements from the iterable

Parameters

other

index

S.index(value, [start, [stop]]) -> integer -- return first index of value. Raises ValueError if the value is not present.

Supporting start and stop arguments is optional, but recommended.

Parameters

item
args

insert

S.insert(index, value) -- insert value before index

Parameters

i
item

learn_one

Fits to a set of features x and a real-valued target y.

Parameters

x (dict)
y (numbers.Number)
kwargs

Returns

self

pop

S.pop([index]) -> item -- remove and return item at index (default last). Raise IndexError if list is empty or index is out of range.

Parameters

i – defaults to -1

predict_one

Predicts the target value of a set of features x.

Parameters

x

Returns

The prediction.

remove

S.remove(value) -- remove first occurrence of value. Raise ValueError if the value is not present.

Parameters

item

reset

reverse

S.reverse() -- reverse IN PLACE

sort

Notes¶

This implementation uses n_models=10 as default given the impact on processing time. The optimal number of models depends on the data and resources available.

References¶

Heitor Gomes, Jacob Montiel, Saulo Martiello Mastelini, Bernhard Pfahringer, and Albert Bifet. On Ensemble Techniques for Data Stream Regression. IJCNN'20. International Joint Conference on Neural Networks. 2020. ↩
Heitor Murilo Gomes, Jesse Read, Albert Bifet. Streaming Random Patches for Evolving Data Stream Classification. IEEE International Conference on Data Mining (ICDM), 2019. ↩