Skip to content

ChebyshevUnderSamplerΒΆ

Under-sampling for imbalanced regression using Chebyshev's inequality.

Chebyshev's inequality can be used to define the probability of target observations being frequent values (w.r.t. the distribution mean).

Let Y be a random variable with finite expected value y― and non-zero variance Οƒ2. For any real number t>0, the Chebyshev's inequality states that, for a wide class of unimodal probability distributions: Pr(|yβˆ’y―|β‰₯tΟƒ)≀1t2.

Taking t=|yβˆ’y―|Οƒ, and assuming t>1, the Chebyshev’s inequality for an observation y becomes: P(|yβˆ’y―|=t)=Οƒ2|yβˆ’y―|. The reciprocal of this probability is used for under-sampling1 the most frequent cases. Extreme valued or rare cases have higher probabilities of selection, whereas the most frequent cases are likely to be discarded. Still, frequent cases have a small chance of being selected (controlled via the sp parameter) in case few rare instances were observed.

ParametersΒΆ

  • regressor (base.Regressor)

    The regression model that will receive the biased sample.

  • sp (float) – defaults to 0.15

    Second chance probability. Even if an example is not initially selected for training, it still has a small chance of being selected in case the number of rare case observed so far is small.

  • seed (int) – defaults to None

    Random seed to support reproducibility.

ExamplesΒΆ

>>> from river import datasets
>>> from river import evaluate
>>> from river import imblearn
>>> from river import metrics
>>> from river import preprocessing
>>> from river import rules

>>> model = (
...     preprocessing.StandardScaler() |
...     imblearn.ChebyshevUnderSampler(
...         regressor=rules.AMRules(
...             n_min=50, delta=0.01,
...         ),
...         seed=42
...     )
... )

>>> evaluate.progressive_val_score(
...     datasets.TrumpApproval(),
...     model,
...     metrics.MAE(),
...     print_every=500
... )
[500] MAE: 1.787162
[1,000] MAE: 1.515711
[1,001] MAE: 1.515236
MAE: 1.515236

MethodsΒΆ

learn_one

Fits to a set of features x and a real-valued target y.

Parameters

  • x
  • y
  • kwargs

Returns

self

predict_one

Predict the output of features x.

Parameters

  • x
  • kwargs

Returns

The prediction.

ReferencesΒΆ


  1. Aminian, Ehsan, Rita P. Ribeiro, and JoΓ£o Gama. "Chebyshev approaches for imbalanced data streams regression models." Data Mining and Knowledge Discovery 35.6 (2021): 2389-2466. β†©