Skip to content

EntropySampler

Active learning classifier based on entropy measures.

The entropy sampler selects samples for labeling based on the entropy of the prediction. The higher the entropy, the more likely the sample will be selected for labeling. The entropy measure is normalized to [0, 1] and then raised to the power of the discount factor.

Parameters

  • classifier

    Typebase.Classifier

    The classifier to wrap.

  • discount_factor

    Typefloat

    Default3

    The discount factor to apply to the entropy measure. A value of 1 won't affect the entropy. The higher the discount factor, the more the entropy will be discounted, and the less likely samples will be selected for labeling. A value of 0 will select all samples for labeling. The discount factor is thus a way to control how many samples are selected for labeling.

  • seed

    DefaultNone

    Random number generator seed for reproducibility.

Examples

from river import active
from river import datasets
from river import feature_extraction
from river import linear_model
from river import metrics

dataset = datasets.SMSSpam()
metric = metrics.Accuracy()
model = (
    feature_extraction.TFIDF(on='body') |
    linear_model.LogisticRegression()
)
model = active.EntropySampler(model, seed=42)

n_samples_used = 0
for x, y in dataset:
    y_pred, ask = model.predict_one(x)
    metric.update(y, y_pred)
    if ask:
        n_samples_used += 1
        model.learn_one(x, y)

metric
Accuracy: 86.60%

dataset.n_samples, n_samples_used
(5574, 1921)

print(f"{n_samples_used / dataset.n_samples:.2%}")
34.46%

Methods

learn_one

Update the model with a set of features x and a label y.

Parameters

  • x
  • y
  • kwargs

predict_one

Predict the label of x and indicate whether a label is needed.

Parameters

  • x
  • kwargs

Returns

The predicted label.

predict_proba_one

Predict the probability of each label for x and indicate whether a label is needed.

Parameters

  • x
  • kwargs

Returns

A dictionary that associates a probability which each label.