LabelCombinationHoeffdingTreeClassifier¶

Label Combination Hoeffding Tree for multi-label classification.

Label combination transforms the problem from multi-label to multi-class. For each unique combination of labels it assigns a class and proceeds with training the hoeffding tree normally.

The transformation is done by changing the label set which could be seen as a binary number to an int which will represent the class, and after the prediction the int is converted back to a binary number which is the predicted label-set.

Parameters¶

grace_period (int) – defaults to 200

Number of instances a leaf should observe between split attempts.
max_depth (int) – defaults to None

The maximum depth a tree can reach. If None, the tree will grow indefinitely.
split_criterion (str) – defaults to info_gain

Split criterion to use.
- 'gini' - Gini
- 'info_gain' - Information Gain
- 'hellinger' - Helinger Distance
delta (float) – defaults to 1e-07

Significance level to calculate the Hoeffding bound. The significance level is given by 1 - delta. Values closer to zero imply longer split decision delays.
tau (float) – defaults to 0.05

Threshold below which a split will be forced to break ties.
leaf_prediction (str) – defaults to nba

Prediction mechanism used at leafs.
- 'mc' - Majority Class
- 'nb' - Naive Bayes
- 'nba' - Naive Bayes Adaptive
nb_threshold (int) – defaults to 0

Number of instances a leaf should observe before allowing Naive Bayes.
nominal_attributes (list) – defaults to None

List of Nominal attributes identifiers. If empty, then assume that all numeric attributes should be treated as continuous.
splitter (river.tree.splitter.base.Splitter) – defaults to None

The Splitter or Attribute Observer (AO) used to monitor the class statistics of numeric features and perform splits. Splitters are available in the tree.splitter module. Different splitters are available for classification and regression tasks. Classification and regression splitters can be distinguished by their property is_target_class. This is an advanced option. Special care must be taken when choosing different splitters. By default, tree.splitter.GaussianSplitter is used if splitter is None.
binary_split (bool) – defaults to False

If True, only allow binary splits.
max_size (float) – defaults to 100.0

The max size of the tree, in Megabytes (MB).
memory_estimate_period (int) – defaults to 1000000

Interval (number of processed instances) between memory consumption checks.
stop_mem_management (bool) – defaults to False

If True, stop growing as soon as memory limit is hit.
remove_poor_attrs (bool) – defaults to False

If True, disable poor attributes to reduce memory usage.
merit_preprune (bool) – defaults to True

If True, enable merit-based tree pre-pruning.

Attributes¶

height
leaf_prediction

Return the prediction strategy used by the tree at its leaves.
max_size

Max allowed size tree can reach (in MB).
n_active_leaves
n_branches
n_inactive_leaves
n_leaves
n_nodes
split_criterion

Return a string with the name of the split criterion being used by the tree.
summary

Collect metrics corresponding to the current status of the tree in a string buffer.

Examples¶

>>> from river import datasets
>>> from river import evaluate
>>> from river import metrics
>>> from river import tree

>>> dataset = iter(datasets.Music().take(200))
>>> model = tree.LabelCombinationHoeffdingTreeClassifier(
...     delta=1e-5,
...     grace_period=50
... )

>>> metric = metrics.multioutput.MicroAverage(metrics.Accuracy())

>>> evaluate.progressive_val_score(dataset, model, metric)
MicroAverage(Accuracy): 71.11%

Methods¶

debug_one

Print an explanation of how x is predicted.

Parameters

x (dict)

Returns

typing.Optional[str]: A representation of the path followed by the tree to predict x; None if

draw

Draw the tree using the graphviz library.

Since the tree is drawn without passing incoming samples, classification trees will show the majority class in their leaves, whereas regression trees will use the target mean.

Parameters

max_depth (int) – defaults to None
The maximum depth a tree can reach. If None, the tree will grow indefinitely.

learn_one

Update the Multi-label Hoeffding Tree Classifier.

Parameters

x
y
sample_weight – defaults to 1.0

Returns

self

predict_one

Predict the labels of an instance.

Parameters

x (dict)

Returns

typing.Union[bool, str, int, NoneType]: Predicted labels.

predict_proba_one

Predict the probability of each label for a dictionary of features x.

Parameters

x

Returns

A dictionary that associates a probability which each label.

to_dataframe

Return a representation of the current tree structure organized in a pandas.DataFrame object.

In case the tree is empty or it only contains a single node (a leaf), None is returned.

Returns

df