RandomSampler¶
Random sampling by mixing under-sampling and over-sampling.
This is a wrapper for classifiers. It will train the provided classifier by both under-sampling and over-sampling the stream of given observations so that the class distribution seen by the classifier follows a given desired distribution.
See Working with imbalanced data for example usage.
Parameters¶
-
classifier (base.Classifier)
-
desired_dist (dict)
The desired class distribution. The keys are the classes whilst the values are the desired class percentages. The values must sum up to 1. If set to
None
, then the observations will be sampled uniformly at random, which is stricly equivalent to usingensemble.BaggingClassifier
. -
sampling_rate – defaults to
1.0
The desired ratio of data to sample.
-
seed (int) – defaults to
None
Random seed for reproducibility.
Examples¶
>>> from river import datasets
>>> from river import evaluate
>>> from river import imblearn
>>> from river import linear_model
>>> from river import metrics
>>> from river import preprocessing
>>> model = imblearn.RandomSampler(
... (
... preprocessing.StandardScaler() |
... linear_model.LogisticRegression()
... ),
... desired_dist={False: 0.4, True: 0.6},
... sampling_rate=0.8,
... seed=42
... )
>>> dataset = datasets.CreditCard().take(3000)
>>> metric = metrics.LogLoss()
>>> evaluate.progressive_val_score(dataset, model, metric)
LogLoss: 0.131988
Methods¶
learn_one
Update the model with a set of features x
and a label y
.
Parameters
- x (dict)
- y (Union[bool, str, int])
- kwargs
Returns
Classifier: self
predict_one
Predict the label of a set of features x
.
Parameters
- x
- kwargs
Returns
The predicted label.
predict_proba_one
Predict the probability of each label for a dictionary of features x
.
Parameters
- x
- kwargs
Returns
A dictionary that associates a probability which each label.