Skip to content

RandomTree

Random Tree generator.

This generator is based on 1. The generator creates a random tree by splitting features at random and setting labels at its leaves.

The tree structure is composed of node objects, which can be either inner nodes or leaf nodes. The choice comes as a function of the parameters passed to its initializer.

Since the concepts are generated and classified according to a tree structure, in theory, it should favor decision tree learners.

Parameters

  • seed_tree

    Typeint | None

    DefaultNone

    Seed for random generation of tree.

  • seed_sample

    Typeint | None

    DefaultNone

    Seed for random generation of instances.

  • n_classes

    Typeint

    Default2

    The number of classes to generate.

  • n_num_features

    Typeint

    Default5

    The number of numerical features to generate.

  • n_cat_features

    Typeint

    Default5

    The number of categorical features to generate.

  • n_categories_per_feature

    Typeint

    Default5

    The number of values to generate per categorical feature.

  • max_tree_depth

    Typeint

    Default5

    The maximum depth of the tree concept.

  • first_leaf_level

    Typeint

    Default3

    The first level of the tree above max_tree_depth that can have leaves.

  • fraction_leaves_per_level

    Typefloat

    Default0.15

    The fraction of leaves per level from first_leaf_level onwards.

Attributes

  • desc

    Return the description from the docstring.

Examples

from river.datasets import synth

dataset = synth.RandomTree(seed_tree=42, seed_sample=42, n_classes=2,
                           n_num_features=2, n_cat_features=2,
                           n_categories_per_feature=2, max_tree_depth=6,
                           first_leaf_level=3, fraction_leaves_per_level=0.15)

for x, y in dataset.take(5):
    print(x, y)
{'x_num_0': 0.6394, 'x_num_1': 0.0250, 'x_cat_0': 1, 'x_cat_1': 0} 0
{'x_num_0': 0.2232, 'x_num_1': 0.7364, 'x_cat_0': 0, 'x_cat_1': 1} 1
{'x_num_0': 0.0317, 'x_num_1': 0.0936, 'x_cat_0': 0, 'x_cat_1': 0} 0
{'x_num_0': 0.5612, 'x_num_1': 0.7160, 'x_cat_0': 1, 'x_cat_1': 0} 0
{'x_num_0': 0.4492, 'x_num_1': 0.2781, 'x_cat_0': 0, 'x_cat_1': 0} 0

Methods

take

Iterate over the k samples.

Parameters

  • k'int'


  1. Domingos, Pedro, and Geoff Hulten. "Mining high-speed data streams." In Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 71-80. 2000.