EntropySampler¶
Active learning classifier based on entropy measures.
The entropy sampler selects samples for labeling based on the entropy of the prediction. The higher the entropy, the more likely the sample will be selected for labeling. The entropy measure is normalized to [0, 1] and then raised to the power of the discount factor.
Parameters¶
-
classifier (base.Classifier)
The classifier to wrap.
-
discount_factor (float) – defaults to
3
The discount factor to apply to the entropy measure. A value of 1 won't affect the entropy. The higher the discount factor, the more the entropy will be discounted, and the less likely samples will be selected for labeling. A value of 0 will select all samples for labeling. The discount factor is thus a way to control how many samples are selected for labeling.
-
seed – defaults to
None
Random number generator seed for reproducibility.
Examples¶
>>> from river import active
>>> from river import datasets
>>> from river import feature_extraction
>>> from river import linear_model
>>> from river import metrics
>>> dataset = datasets.SMSSpam()
>>> metric = metrics.Accuracy()
>>> model = (
... feature_extraction.TFIDF(on='body') |
... linear_model.LogisticRegression()
... )
>>> model = active.EntropySampler(model, seed=42)
>>> n_samples_used = 0
>>> for x, y in dataset:
... y_pred, ask = model.predict_one(x)
... metric = metric.update(y, y_pred)
... if ask:
... n_samples_used += 1
... model = model.learn_one(x, y)
>>> metric
Accuracy: 86.60%
>>> dataset.n_samples, n_samples_used
(5574, 1922)
>>> print(f"{n_samples_used / dataset.n_samples:.2%}")
34.48%
Methods¶
learn_one
Update the model with a set of features x
and a label y
.
Parameters
- x
- y
- kwargs
Returns
self
predict_one
Predict the label of x
and indicate whether a label is needed.
Parameters
- x
- kwargs
Returns
The predicted label.
predict_proba_one
Predict the probability of each label for x
and indicate whether a label is needed.
Parameters
- x
- kwargs
Returns
A dictionary that associates a probability which each label.