HoeffdingAdaptiveTreeClassifierΒΆ
Hoeffding Adaptive Tree classifier.
ParametersΒΆ
-
grace_period (int) β defaults to
200Number of instances a leaf should observe between split attempts.
-
max_depth (int) β defaults to
NoneThe maximum depth a tree can reach. If
None, the tree will grow indefinitely. -
split_criterion (str) β defaults to
info_gainSplit criterion to use.
- 'gini' - Gini
- 'info_gain' - Information Gain
- 'hellinger' - Helinger Distance -
delta (float) β defaults to
1e-07Significance level to calculate the Hoeffding bound. The significance level is given by
1 - delta. Values closer to zero imply longer split decision delays. -
tau (float) β defaults to
0.05Threshold below which a split will be forced to break ties.
-
leaf_prediction (str) β defaults to
nbaPrediction mechanism used at leafs.
- 'mc' - Majority Class
- 'nb' - Naive Bayes
- 'nba' - Naive Bayes Adaptive -
nb_threshold (int) β defaults to
0Number of instances a leaf should observe before allowing Naive Bayes.
-
nominal_attributes (list) β defaults to
NoneList of Nominal attributes. If empty, then assume that all numeric attributes should be treated as continuous.
-
splitter (river.tree.splitter.base.Splitter) β defaults to
NoneThe Splitter or Attribute Observer (AO) used to monitor the class statistics of numeric features and perform splits. Splitters are available in the
tree.splittermodule. Different splitters are available for classification and regression tasks. Classification and regression splitters can be distinguished by their propertyis_target_class. This is an advanced option. Special care must be taken when choosing different splitters. By default,tree.splitter.GaussianSplitteris used ifsplitterisNone. -
bootstrap_sampling (bool) β defaults to
TrueIf True, perform bootstrap sampling in the leaf nodes.
-
drift_window_threshold (int) β defaults to
300Minimum number of examples an alternate tree must observe before being considered as a potential replacement to the current one.
-
drift_detector (Optional[base.DriftDetector]) β defaults to
NoneThe drift detector used to build the tree. If
Nonethendrift.ADWINis used. -
switch_significance (float) β defaults to
0.05The significance level to assess whether alternate subtrees are significantly better than their main subtree counterparts.
-
binary_split (bool) β defaults to
FalseIf True, only allow binary splits.
-
max_size (float) β defaults to
100.0The max size of the tree, in Megabytes (MB).
-
memory_estimate_period (int) β defaults to
1000000Interval (number of processed instances) between memory consumption checks.
-
stop_mem_management (bool) β defaults to
FalseIf True, stop growing as soon as memory limit is hit.
-
remove_poor_attrs (bool) β defaults to
FalseIf True, disable poor attributes to reduce memory usage.
-
merit_preprune (bool) β defaults to
TrueIf True, enable merit-based tree pre-pruning.
-
seed (int) β defaults to
NoneRandom seed for reproducibility.
AttributesΒΆ
-
height
-
leaf_prediction
Return the prediction strategy used by the tree at its leaves.
-
max_size
Max allowed size tree can reach (in MB).
-
n_active_leaves
-
n_alternate_trees
-
n_branches
-
n_inactive_leaves
-
n_leaves
-
n_nodes
-
n_pruned_alternate_trees
-
n_switch_alternate_trees
-
split_criterion
Return a string with the name of the split criterion being used by the tree.
-
summary
Collect metrics corresponding to the current status of the tree in a string buffer.
ExamplesΒΆ
>>> from river.datasets import synth
>>> from river import evaluate
>>> from river import metrics
>>> from river import tree
>>> gen = synth.ConceptDriftStream(stream=synth.SEA(seed=42, variant=0),
... drift_stream=synth.SEA(seed=42, variant=1),
... seed=1, position=500, width=50)
>>> # Take 1000 instances from the infinite data generator
>>> dataset = iter(gen.take(1000))
>>> model = tree.HoeffdingAdaptiveTreeClassifier(
... grace_period=100,
... delta=1e-5,
... leaf_prediction='nb',
... nb_threshold=10,
... seed=0
... )
>>> metric = metrics.Accuracy()
>>> evaluate.progressive_val_score(dataset, model, metric)
Accuracy: 91.49%
MethodsΒΆ
debug_one
Print an explanation of how x is predicted.
Parameters
- x (dict)
Returns
typing.Optional[str]: A representation of the path followed by the tree to predict x; None if
draw
Draw the tree using the graphviz library.
Since the tree is drawn without passing incoming samples, classification trees will show the majority class in their leaves, whereas regression trees will use the target mean.
Parameters
- max_depth (int) β defaults to
None
The maximum depth a tree can reach. IfNone, the tree will grow indefinitely.
learn_one
Train the model on instance x and corresponding target y.
Parameters
- x
- y
- sample_weight β defaults to
1.0
Returns
self
predict_one
Predict the label of a set of features x.
Parameters
- x (dict)
- kwargs
Returns
typing.Union[bool, str, int, NoneType]: The predicted label.
predict_proba_one
Predict the probability of each label for a dictionary of features x.
Parameters
- x
Returns
A dictionary that associates a probability which each label.
to_dataframe
Return a representation of the current tree structure organized in a pandas.DataFrame object.
In case the tree is empty or it only contains a single node (a leaf), None is returned.
Returns
df
NotesΒΆ
The Hoeffding Adaptive Tree 1 uses a drift detector to monitor performance of branches in the tree and to replace them with new branches when their accuracy decreases.
The bootstrap sampling strategy is an improvement over the original Hoeffding Adaptive Tree algorithm. It is enabled by default since, in general, it results in better performance.
ReferencesΒΆ
-
Bifet, Albert, and Ricard GavaldΓ . "Adaptive learning from evolving data streams." In International Symposium on Intelligent Data Analysis, pp. 249-260. Springer, Berlin, Heidelberg, 2009. β©