Skip to content

ExtremelyFastDecisionTreeClassifier

Extremely Fast Decision Tree classifier.

Also referred to as Hoeffding AnyTime Tree (HATT) classifier.

Parameters

  • grace_period

    Typeint

    Default200

    Number of instances a leaf should observe between split attempts.

  • max_depth

    Typeint | None

    DefaultNone

    The maximum depth a tree can reach. If None, the tree will grow indefinitely.

  • min_samples_reevaluate

    Typeint

    Default20

    Number of instances a node should observe before reevaluating the best split.

  • split_criterion

    Typestr

    Defaultinfo_gain

    Split criterion to use.
    - 'gini' - Gini
    - 'info_gain' - Information Gain
    - 'hellinger' - Helinger Distance

  • delta

    Typefloat

    Default1e-07

    Significance level to calculate the Hoeffding bound. The significance level is given by 1 - delta. Values closer to zero imply longer split decision delays.

  • tau

    Typefloat

    Default0.05

    Threshold below which a split will be forced to break ties.

  • leaf_prediction

    Typestr

    Defaultnba

    Prediction mechanism used at leafs.
    - 'mc' - Majority Class
    - 'nb' - Naive Bayes
    - 'nba' - Naive Bayes Adaptive

  • nb_threshold

    Typeint

    Default0

    Number of instances a leaf should observe before allowing Naive Bayes.

  • nominal_attributes

    Typelist | None

    DefaultNone

    List of Nominal attributes identifiers. If empty, then assume that all numeric attributes should be treated as continuous.

  • splitter

    TypeSplitter | None

    DefaultNone

    The Splitter or Attribute Observer (AO) used to monitor the class statistics of numeric features and perform splits. Splitters are available in the tree.splitter module. Different splitters are available for classification and regression tasks. Classification and regression splitters can be distinguished by their property is_target_class. This is an advanced option. Special care must be taken when choosing different splitters. By default, tree.splitter.GaussianSplitter is used if splitter is None.

  • binary_split

    Typebool

    DefaultFalse

    If True, only allow binary splits.

  • max_size

    Typefloat

    Default100.0

    The max size of the tree, in Megabytes (MB).

  • memory_estimate_period

    Typeint

    Default1000000

    Interval (number of processed instances) between memory consumption checks.

  • stop_mem_management

    Typebool

    DefaultFalse

    If True, stop growing as soon as memory limit is hit.

  • remove_poor_attrs

    Typebool

    DefaultFalse

    If True, disable poor attributes to reduce memory usage.

  • merit_preprune

    Typebool

    DefaultTrue

    If True, enable merit-based tree pre-pruning.

Attributes

  • height

  • leaf_prediction

    Return the prediction strategy used by the tree at its leaves.

  • max_size

    Max allowed size tree can reach (in MB).

  • n_active_leaves

  • n_branches

  • n_inactive_leaves

  • n_leaves

  • n_nodes

  • split_criterion

    Return a string with the name of the split criterion being used by the tree.

  • summary

    Collect metrics corresponding to the current status of the tree in a string buffer.

Examples

from river.datasets import synth
from river import evaluate
from river import metrics
from river import tree

gen = synth.Agrawal(classification_function=0, seed=42)
dataset = iter(gen.take(1000))

model = tree.ExtremelyFastDecisionTreeClassifier(
    grace_period=100,
    delta=1e-5,
    nominal_attributes=['elevel', 'car', 'zipcode'],
    min_samples_reevaluate=100
)

metric = metrics.Accuracy()

evaluate.progressive_val_score(dataset, model, metric)
Accuracy: 87.29%

Methods

debug_one

Print an explanation of how x is predicted.

Parameters

  • x'dict'

Returns

str | None: A representation of the path followed by the tree to predict x; None if

draw

Draw the tree using the graphviz library.

Since the tree is drawn without passing incoming samples, classification trees will show the majority class in their leaves, whereas regression trees will use the target mean.

Parameters

  • max_depth'int | None' — defaults to None
    The maximum depth a tree can reach. If None, the tree will grow indefinitely.

learn_one

Incrementally train the model

Parameters

  • x
  • y
  • sample_weight — defaults to 1.0

Returns

self

predict_one

Predict the label of a set of features x.

Parameters

  • x'dict'
  • kwargs

Returns

base.typing.ClfTarget | None: The predicted label.

predict_proba_one

Predict the probability of each label for a dictionary of features x.

Parameters

  • x

Returns

A dictionary that associates a probability which each label.

to_dataframe

Return a representation of the current tree structure organized in a pandas.DataFrame object.

In case the tree is empty or it only contains a single node (a leaf), None is returned.

Returns

df

Notes

The Extremely Fast Decision Tree (EFDT) 1 constructs a tree incrementally. The EFDT seeks to select and deploy a split as soon as it is confident the split is useful, and then revisits that decision, replacing the split if it subsequently becomes evident that a better split is available. The EFDT learns rapidly from a stationary distribution and eventually it learns the asymptotic batch tree if the distribution from which the data are drawn is stationary.


  1. C. Manapragada, G. Webb, and M. Salehi. Extremely Fast Decision Tree. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD '18). ACM, New York, NY, USA, 1953-1962. DOI: https://doi.org/10.1145/3219819.3220005