AMFClassifier¶
Aggregated Mondrian Forest classifier for online learning.
This implementation is truly online, in the sense that a single pass is performed, and that predictions can be produced anytime.
Each node in a tree predicts according to the distribution of the labels it contains. This distribution is regularized using a "Jeffreys" prior with parameter dirichlet
. For each class with count
labels in the node and n_samples
samples in it, the prediction of a node is given by
\(\frac{count + dirichlet}{n_{samples} + dirichlet \times n_{classes}}\).
The prediction for a sample is computed as the aggregated predictions of all the subtrees along the path leading to the leaf node containing the sample. The aggregation weights are exponential weights with learning rate step
and log-loss when use_aggregation
is True
.
This computation is performed exactly thanks to a context tree weighting algorithm. More details can be found in the paper cited in the references below.
The final predictions are the average class probabilities predicted by each of the n_estimators
trees in the forest.
Parameters¶
-
n_estimators (int) – defaults to
10
The number of trees in the forest.
-
step (float) – defaults to
1.0
Step-size for the aggregation weights. Default is 1 for classification with the log-loss, which is usually the best choice.
-
use_aggregation (bool) – defaults to
True
Controls if aggregation is used in the trees. It is highly recommended to leave it as
True
. -
dirichlet (float) – defaults to
0.5
Regularization level of the class frequencies used for predictions in each node. A rule of thumb is to set this to
1 / n_classes
, wheren_classes
is the expected number of classes which might appear. Default isdirichlet = 0.5
, which works well for binary classification problems. -
split_pure (bool) – defaults to
False
Controls if nodes that contains only sample of the same class should be split ("pure" nodes). Default is
False
, namely pure nodes are not split, butTrue
can be sometimes better. -
seed (int) – defaults to
None
Random seed for reproducibility.
Attributes¶
- models
Examples¶
>>> from river import datasets
>>> from river import evaluate
>>> from river import forest
>>> from river import metrics
>>> dataset = datasets.Bananas().take(500)
>>> model = forest.AMFClassifier(
... n_estimators=10,
... use_aggregation=True,
... dirichlet=0.5,
... seed=1
... )
>>> metric = metrics.Accuracy()
>>> evaluate.progressive_val_score(dataset, model, metric)
Accuracy: 84.97%
Methods¶
append
S.append(value) -- append value to the end of the sequence
Parameters
- item
clear
S.clear() -> None -- remove all items from S
copy
count
S.count(value) -> integer -- return number of occurrences of value
Parameters
- item
extend
S.extend(iterable) -- extend sequence by appending elements from the iterable
Parameters
- other
index
S.index(value, [start, [stop]]) -> integer -- return first index of value. Raises ValueError if the value is not present.
Supporting start and stop arguments is optional, but recommended.
Parameters
- item
- args
insert
S.insert(index, value) -- insert value before index
Parameters
- i
- item
learn_one
Update the model with a set of features x
and a label y
.
Parameters
- x
- y
Returns
self
pop
S.pop([index]) -> item -- remove and return item at index (default last). Raise IndexError if list is empty or index is out of range.
Parameters
- i – defaults to
-1
predict_one
Predict the label of a set of features x
.
Parameters
- x (dict)
- kwargs
Returns
typing.Union[bool, str, int, NoneType]: The predicted label.
predict_proba_one
Predict the probability of each label for a dictionary of features x
.
Parameters
- x
Returns
A dictionary that associates a probability which each label.
remove
S.remove(value) -- remove first occurrence of value. Raise ValueError if the value is not present.
Parameters
- item
reverse
S.reverse() -- reverse IN PLACE
sort
Notes¶
Only log_loss used for the computation of the aggregation weights is supported for now, namely the log-loss for multi-class classification.
References¶
J. Mourtada, S. Gaiffas and E. Scornet, AMF: Aggregated Mondrian Forests for Online Learning, arXiv:1906.10529, 2019.