Skip to content

RandomOverSampler

Random over-sampling.

This is a wrapper for classifiers. It will train the provided classifier by over-sampling the stream of given observations so that the class distribution seen by the classifier follows a given desired distribution. The implementation is a discrete version of reverse rejection sampling.

See Working with imbalanced data for example usage.

Parameters

  • classifier (base.Classifier)

  • desired_dist (dict)

    The desired class distribution. The keys are the classes whilst the values are the desired class percentages. The values must sum up to 1.

  • seed (int) – defaults to None

    Random seed for reproducibility.

Examples

>>> from river import datasets
>>> from river import evaluate
>>> from river import imblearn
>>> from river import linear_model
>>> from river import metrics
>>> from river import preprocessing

>>> model = imblearn.RandomOverSampler(
...     (
...         preprocessing.StandardScaler() |
...         linear_model.LogisticRegression()
...     ),
...     desired_dist={False: 0.4, True: 0.6},
...     seed=42
... )

>>> dataset = datasets.CreditCard().take(3000)

>>> metric = metrics.LogLoss()

>>> evaluate.progressive_val_score(dataset, model, metric)
LogLoss: 0.054338

Methods

learn_one

Update the model with a set of features x and a label y.

Parameters

  • x (dict)
  • y (Union[bool, str, int])

Returns

Classifier: self

predict_one

Predict the label of a set of features x.

Parameters

  • x

Returns

The predicted label.

predict_proba_one

Predict the probability of each label for a dictionary of features x.

Parameters

  • x

Returns

A dictionary that associates a probability which each label.