EntropySampler¶
Active learning classifier based on entropy measures.
The entropy sampler selects samples for labeling based on the entropy of the prediction. The higher the entropy, the more likely the sample will be selected for labeling. The entropy measure is normalized to [0, 1] and then raised to the power of the discount factor.
Parameters¶
-
classifier
Type → base.Classifier
The classifier to wrap.
-
discount_factor
Type → float
Default →
3
The discount factor to apply to the entropy measure. A value of 1 won't affect the entropy. The higher the discount factor, the more the entropy will be discounted, and the less likely samples will be selected for labeling. A value of 0 will select all samples for labeling. The discount factor is thus a way to control how many samples are selected for labeling.
-
seed
Default →
None
Random number generator seed for reproducibility.
Examples¶
from river import active
from river import datasets
from river import feature_extraction
from river import linear_model
from river import metrics
dataset = datasets.SMSSpam()
metric = metrics.Accuracy()
model = (
feature_extraction.TFIDF(on='body') |
linear_model.LogisticRegression()
)
model = active.EntropySampler(model, seed=42)
n_samples_used = 0
for x, y in dataset:
y_pred, ask = model.predict_one(x)
metric = metric.update(y, y_pred)
if ask:
n_samples_used += 1
model = model.learn_one(x, y)
metric
Accuracy: 86.60%
dataset.n_samples, n_samples_used
(5574, 1921)
print(f"{n_samples_used / dataset.n_samples:.2%}")
34.46%
Methods¶
learn_one
Update the model with a set of features x
and a label y
.
Parameters
- x
- y
- kwargs
Returns
self
predict_one
Predict the label of x
and indicate whether a label is needed.
Parameters
- x
- kwargs
Returns
The predicted label.
predict_proba_one
Predict the probability of each label for x
and indicate whether a label is needed.
Parameters
- x
- kwargs
Returns
A dictionary that associates a probability which each label.