RandomUnderSampler¶

Random under-sampling.

This is a wrapper for classifiers. It will train the provided classifier by under-sampling the stream of given observations so that the class distribution seen by the classifier follows a given desired distribution. The implementation is a discrete version of rejection sampling.

See Working with imbalanced data for example usage.

Parameters¶

classifier

Type → base.Classifier
desired_dist

Type → dict

The desired class distribution. The keys are the classes whilst the values are the desired class percentages. The values must sum up to 1.
seed

Type → int | None

Default → None

Random seed for reproducibility.

Examples¶

from river import datasets
from river import evaluate
from river import imblearn
from river import linear_model
from river import metrics
from river import preprocessing

model = imblearn.RandomUnderSampler(
    (
        preprocessing.StandardScaler() |
        linear_model.LogisticRegression()
    ),
    desired_dist={False: 0.4, True: 0.6},
    seed=42
)

dataset = datasets.CreditCard().take(3000)

metric = metrics.LogLoss()

evaluate.progressive_val_score(dataset, model, metric)

LogLoss: 0.0336...

Methods¶

learn_one

Update the model with a set of features x and a label y.

Parameters

x — 'dict[base.typing.FeatureName, Any]'
y — 'base.typing.ClfTarget'
kwargs

predict_one

Predict the label of a set of features x.

Parameters

x
kwargs

Returns

The predicted label.

predict_proba_one

Predict the probability of each label for a dictionary of features x.

Parameters

x
kwargs

Returns

A dictionary that associates a probability which each label.