Skip to content

iSOUPTreeRegressor

Incremental Structured Output Prediction Tree (iSOUP-Tree) for multi-target regression.

This is an implementation of the iSOUP-Tree proposed by A. Osojnik, P. Panov, and S. Dลพeroski 1.

Parameters

  • grace_period

    Type โ†’ int

    Default โ†’ 200

    Number of instances a leaf should observe between split attempts.

  • max_depth

    Type โ†’ int | None

    Default โ†’ None

    The maximum depth a tree can reach. If None, the tree will grow indefinitely.

  • delta

    Type โ†’ float

    Default โ†’ 1e-07

    Allowed error in split decision, a value closer to 0 takes longer to decide.

  • tau

    Type โ†’ float

    Default โ†’ 0.05

    Threshold below which a split will be forced to break ties.

  • leaf_prediction

    Type โ†’ str

    Default โ†’ adaptive

    Prediction mechanism used at leafs.
    - 'mean' - Target mean
    - 'model' - Uses the model defined in leaf_model
    - 'adaptive' - Chooses between 'mean' and 'model' dynamically

  • leaf_model

    Type โ†’ base.Regressor | dict | None

    Default โ†’ None

    The regression model(s) used to provide responses if leaf_prediction='model'. It can be either a regressor (in which case it is going to be replicated to all the targets) or a dictionary whose keys are target identifiers, and the values are instances of base.Regressor.If not provided, instances of [linear_model.LinearRegression`](../../linear-model/LinearRegression) with the default hyperparameters are used for all the targets. If a dictionary is passed and not all target models are specified, copies from the first model match in the dictionary will be used to the remaining targets.

  • model_selector_decay

    Type โ†’ float

    Default โ†’ 0.95

    The exponential decaying factor applied to the learning models' squared errors, that are monitored if leaf_prediction='adaptive'. Must be between 0 and 1. The closer to 1, the more importance is going to be given to past observations. On the other hand, if its value approaches 0, the recent observed errors are going to have more influence on the final decision.

  • nominal_attributes

    Type โ†’ list | None

    Default โ†’ None

    List of Nominal attributes identifiers. If empty, then assume that all numeric attributes should be treated as continuous.

  • splitter

    Type โ†’ Splitter | None

    Default โ†’ None

    The Splitter or Attribute Observer (AO) used to monitor the class statistics of numeric features and perform splits. Splitters are available in the tree.splitter module. Different splitters are available for classification and regression tasks. Classification and regression splitters can be distinguished by their property is_target_class. This is an advanced option. Special care must be taken when choosing different splitters. By default, tree.splitter.TEBSTSplitter is used if splitter is None.

  • min_samples_split

    Type โ†’ int

    Default โ†’ 5

    The minimum number of samples every branch resulting from a split candidate must have to be considered valid.

  • binary_split

    Type โ†’ bool

    Default โ†’ False

    If True, only allow binary splits.

  • max_size

    Type โ†’ float

    Default โ†’ 500.0

    The max size of the tree, in Megabytes (MB).

  • memory_estimate_period

    Type โ†’ int

    Default โ†’ 1000000

    Interval (number of processed instances) between memory consumption checks.

  • stop_mem_management

    Type โ†’ bool

    Default โ†’ False

    If True, stop growing as soon as memory limit is hit.

  • remove_poor_attrs

    Type โ†’ bool

    Default โ†’ False

    If True, disable poor attributes to reduce memory usage.

  • merit_preprune

    Type โ†’ bool

    Default โ†’ True

    If True, enable merit-based tree pre-pruning.

Attributes

  • height

  • leaf_prediction

    Return the prediction strategy used by the tree at its leaves.

  • max_size

    Max allowed size tree can reach (in MB).

  • n_active_leaves

  • n_branches

  • n_inactive_leaves

  • n_leaves

  • n_nodes

  • split_criterion

    Return a string with the name of the split criterion being used by the tree.

  • summary

    Collect metrics corresponding to the current status of the tree in a string buffer.

Examples

import numbers
from river import compose
from river import datasets
from river import evaluate
from river import linear_model
from river import metrics
from river import preprocessing
from river import tree

dataset = datasets.SolarFlare()

num = compose.SelectType(numbers.Number) | preprocessing.MinMaxScaler()
cat = compose.SelectType(str) | preprocessing.OneHotEncoder()

model = tree.iSOUPTreeRegressor(
    grace_period=100,
    leaf_prediction='model',
    leaf_model={
        'c-class-flares': linear_model.LinearRegression(l2=0.02),
        'm-class-flares': linear_model.PARegressor(),
        'x-class-flares': linear_model.LinearRegression(l2=0.1)
    }
)

pipeline = (num + cat) | model
metric = metrics.multioutput.MicroAverage(metrics.MAE())

evaluate.progressive_val_score(dataset, pipeline, metric)
MicroAverage(MAE): 0.426177

Methods

debug_one

Print an explanation of how x is predicted.

Parameters

  • x โ€” 'dict'

Returns

str | None: A representation of the path followed by the tree to predict x; None if

draw

Draw the tree using the graphviz library.

Since the tree is drawn without passing incoming samples, classification trees will show the majority class in their leaves, whereas regression trees will use the target mean.

Parameters

  • max_depth โ€” 'int | None' โ€” defaults to None
    The maximum depth a tree can reach. If None, the tree will grow indefinitely.

learn_one

Incrementally train the model with one sample.

Training tasks: * If the tree is empty, create a leaf node as the root. * If the tree is already initialized, find the corresponding leaf for the instance and update the leaf node statistics. * If growth is allowed and the number of instances that the leaf has observed between split attempts exceed the grace period then attempt to split.

Parameters

  • x
  • y
  • w โ€” 'float' โ€” defaults to 1.0

predict_one

Predict the target value using one of the leaf prediction strategies.

Parameters

  • x

Returns

Predicted target value.

to_dataframe

Return a representation of the current tree structure organized in a pandas.DataFrame object.

In case the tree is empty or it only contains a single node (a leaf), None is returned.

Returns

df


  1. Aljaลพ Osojnik, Panฤe Panov, and Saลกo Dลพeroski. "Tree-based methods for online multi-target regression." Journal of Intelligent Information Systems 50.2 (2018): 315-339.