Skip to content

EntropySampler

Active learning classifier based on entropy measures.

The entropy sampler selects samples for labeling based on the entropy of the prediction. The higher the entropy, the more likely the sample will be selected for labeling. The entropy measure is normalized to [0, 1] and then raised to the power of the discount factor.

Parameters

  • classifier (base.Classifier)

    The classifier to wrap.

  • discount_factor (float) – defaults to 3

    The discount factor to apply to the entropy measure. A value of 1 won't affect the entropy. The higher the discount factor, the more the entropy will be discounted, and the less likely samples will be selected for labeling. A value of 0 will select all samples for labeling. The discount factor is thus a way to control how many samples are selected for labeling.

  • seed – defaults to None

    Random number generator seed for reproducibility.

Examples

>>> from river import active
>>> from river import datasets
>>> from river import feature_extraction
>>> from river import linear_model
>>> from river import metrics

>>> dataset = datasets.SMSSpam()
>>> metric = metrics.Accuracy()
>>> model = (
...     feature_extraction.TFIDF(on='body') |
...     linear_model.LogisticRegression()
... )
>>> model = active.EntropySampler(model, seed=42)

>>> n_samples_used = 0
>>> for x, y in dataset:
...     y_pred, ask = model.predict_one(x)
...     metric = metric.update(y, y_pred)
...     if ask:
...         n_samples_used += 1
...         model = model.learn_one(x, y)

>>> metric
Accuracy: 86.60%

>>> dataset.n_samples, n_samples_used
(5574, 1922)

>>> print(f"{n_samples_used / dataset.n_samples:.2%}")
34.48%

Methods

learn_one

Update the model with a set of features x and a label y.

Parameters

  • x
  • y
  • kwargs

Returns

self

predict_one

Predict the label of x and indicate whether a label is needed.

Parameters

  • x
  • kwargs

Returns

The predicted label.

predict_proba_one

Predict the probability of each label for x and indicate whether a label is needed.

Parameters

  • x
  • kwargs

Returns

A dictionary that associates a probability which each label.