SRPClassifier¶
Streaming Random Patches ensemble classifier.
The Streaming Random Patches (SRP) 1 is an ensemble method that simulates bagging or random subspaces. The default algorithm uses both bagging and random subspaces, namely Random Patches. The default base estimator is a Hoeffding Tree, but other base estimators can be used (differently from random forest variations).
Parameters¶
-
model (base.Estimator) – defaults to
None
The base estimator.
-
n_models (int) – defaults to
10
Number of members in the ensemble.
-
subspace_size (Union[int, float, str]) – defaults to
0.6
Number of features per subset for each classifier where
M
is the total number of features.
A negative value meansM - subspace_size
.
Only applies when using random subspaces or random patches.
* Ifint
indicates the number of features to use. Valid range [2, M].
* Iffloat
indicates the percentage of features to use, Valid range (0., 1.].
* 'sqrt' -sqrt(M)+1
* 'rmsqrt' - Residual fromM-(sqrt(M)+1)
-
training_method (str) – defaults to
patches
The training method to use.
* 'subspaces' - Random subspaces.
* 'resampling' - Resampling.
* 'patches' - Random patches. -
lam (int) – defaults to
6
Lambda value for resampling.
-
drift_detector (base.DriftDetector) – defaults to
None
Drift detector.
-
warning_detector (base.DriftDetector) – defaults to
None
Warning detector.
-
disable_detector (str) – defaults to
off
Option to disable drift detectors:
* If'off'
, detectors are enabled.
* If'drift'
, disables concept drift detection and the background learner.
* If'warning'
, disables the background learner and ensemble members are reset if drift is detected. -
disable_weighted_vote (bool) – defaults to
False
If True, disables weighted voting.
-
seed (int) – defaults to
None
Random number generator seed for reproducibility.
-
metric (Optional[river.metrics.base.ClassificationMetric]) – defaults to
None
The metric to track members performance within the ensemble. This implementation assumes that larger values are better when using weighted votes.
Attributes¶
- models
Examples¶
>>> from river import ensemble
>>> from river import evaluate
>>> from river import metrics
>>> from river.datasets import synth
>>> from river import tree
>>> dataset = synth.ConceptDriftStream(
... seed=42,
... position=500,
... width=50
... ).take(1000)
>>> base_model = tree.HoeffdingTreeClassifier(
... grace_period=50, delta=0.01,
... nominal_attributes=['age', 'car', 'zipcode']
... )
>>> model = ensemble.SRPClassifier(
... model=base_model, n_models=3, seed=42,
... )
>>> metric = metrics.Accuracy()
>>> evaluate.progressive_val_score(dataset, model, metric)
Accuracy: 72.77%
Methods¶
append
S.append(value) -- append value to the end of the sequence
Parameters
- item
clear
S.clear() -> None -- remove all items from S
copy
count
S.count(value) -> integer -- return number of occurrences of value
Parameters
- item
extend
S.extend(iterable) -- extend sequence by appending elements from the iterable
Parameters
- other
index
S.index(value, [start, [stop]]) -> integer -- return first index of value. Raises ValueError if the value is not present.
Supporting start and stop arguments is optional, but recommended.
Parameters
- item
- args
insert
S.insert(index, value) -- insert value before index
Parameters
- i
- item
learn_one
pop
S.pop([index]) -> item -- remove and return item at index (default last). Raise IndexError if list is empty or index is out of range.
Parameters
- i – defaults to
-1
predict_one
Predict the label of a set of features x
.
Parameters
- x (dict)
- kwargs
Returns
typing.Union[bool, str, int, NoneType]: The predicted label.
predict_proba_one
Predict the probability of each label for a dictionary of features x
.
Parameters
- x
- kwargs
Returns
A dictionary that associates a probability which each label.
remove
S.remove(value) -- remove first occurrence of value. Raise ValueError if the value is not present.
Parameters
- item
reset
reverse
S.reverse() -- reverse IN PLACE
sort
Notes¶
This implementation uses n_models=10
as default given the impact on
processing time. The optimal number of models depends on the data and
resources available.
References¶
-
Heitor Murilo Gomes, Jesse Read, Albert Bifet. Streaming Random Patches for Evolving Data Stream Classification. IEEE International Conference on Data Mining (ICDM), 2019. ↩