iSOUPTreeRegressor¶
Incremental Structured Output Prediction Tree (iSOUP-Tree) for multi-target regression.
This is an implementation of the iSOUP-Tree proposed by A. Osojnik, P. Panov, and S. Džeroski 1.
Parameters¶
-
grace_period (int) – defaults to
200
Number of instances a leaf should observe between split attempts.
-
max_depth (int) – defaults to
None
The maximum depth a tree can reach. If
None
, the tree will grow indefinitely. -
split_confidence (float) – defaults to
1e-07
Allowed error in split decision, a value closer to 0 takes longer to decide.
-
tie_threshold (float) – defaults to
0.05
Threshold below which a split will be forced to break ties.
-
leaf_prediction (str) – defaults to
model
Prediction mechanism used at leafs. - 'mean' - Target mean - 'model' - Uses the model defined in
leaf_model
- 'adaptive' - Chooses between 'mean' and 'model' dynamically -
leaf_model (Union[base.Regressor, Dict]) – defaults to
None
The regression model(s) used to provide responses if
leaf_prediction='model'
. It can be either a regressor (in which case it is going to be replicated to all the targets) or a dictionary whose keys are target identifiers, and the values are instances ofriver.base.Regressor.
If not provided, instances ofriver.linear_model.LinearRegression
with the default hyperparameters are used for all the targets. If a dictionary is passed and not all target models are specified, copies from the first model match in the dictionary will be used to the remaining targets. -
model_selector_decay (float) – defaults to
0.95
The exponential decaying factor applied to the learning models' squared errors, that are monitored if
leaf_prediction='adaptive'
. Must be between0
and1
. The closer to1
, the more importance is going to be given to past observations. On the other hand, if its value approaches0
, the recent observed errors are going to have more influence on the final decision. -
nominal_attributes (list) – defaults to
None
List of Nominal attributes identifiers. If empty, then assume that all numeric attributes should be treated as continuous.
-
splitter (river.tree.splitter.base.Splitter) – defaults to
None
The Splitter or Attribute Observer (AO) used to monitor the class statistics of numeric features and perform splits. Splitters are available in the
tree.splitter
module. Different splitters are available for classification and regression tasks. Classification and regression splitters can be distinguished by their propertyis_target_class
. This is an advanced option. Special care must be taken when choosing different splitters. By default,tree.splitter.EBSTSplitter
is used ifsplitter
isNone
. -
min_samples_split (int) – defaults to
5
The minimum number of samples every branch resulting from a split candidate must have to be considered valid.
-
binary_split (bool) – defaults to
False
If True, only allow binary splits.
-
max_size (int) – defaults to
500
The max size of the tree, in Megabytes (MB).
-
memory_estimate_period (int) – defaults to
1000000
Interval (number of processed instances) between memory consumption checks.
-
stop_mem_management (bool) – defaults to
False
If True, stop growing as soon as memory limit is hit.
-
remove_poor_attrs (bool) – defaults to
False
If True, disable poor attributes to reduce memory usage.
-
merit_preprune (bool) – defaults to
True
If True, enable merit-based tree pre-pruning.
Attributes¶
-
height
-
leaf_prediction
Return the prediction strategy used by the tree at its leaves.
-
max_size
Max allowed size tree can reach (in MB).
-
n_active_leaves
-
n_branches
-
n_inactive_leaves
-
n_leaves
-
n_nodes
-
split_criterion
Return a string with the name of the split criterion being used by the tree.
-
summary
Collect metrics corresponding to the current status of the tree in a string buffer.
Examples¶
>>> import numbers
>>> from river import compose
>>> from river import datasets
>>> from river import evaluate
>>> from river import linear_model
>>> from river import metrics
>>> from river import preprocessing
>>> from river import tree
>>> dataset = datasets.SolarFlare()
>>> num = compose.SelectType(numbers.Number) | preprocessing.MinMaxScaler()
>>> cat = compose.SelectType(str) | preprocessing.OneHotEncoder(sparse=False)
>>> model = tree.iSOUPTreeRegressor(
... grace_period=100,
... leaf_prediction='model',
... leaf_model={
... 'c-class-flares': linear_model.LinearRegression(l2=0.02),
... 'm-class-flares': linear_model.PARegressor(),
... 'x-class-flares': linear_model.LinearRegression(l2=0.1)
... }
... )
>>> pipeline = (num + cat) | model
>>> metric = metrics.RegressionMultiOutput(metrics.MAE())
>>> evaluate.progressive_val_score(dataset, pipeline, metric)
MAE: 0.426177
Methods¶
clone
Return a fresh estimator with the same parameters.
The clone has the same parameters but has not been updated with any data. This works by looking at the parameters from the class signature. Each parameter is either - recursively cloned if it's a River classes. - deep-copied via copy.deepcopy
if not. If the calling object is stochastic (i.e. it accepts a seed parameter) and has not been seeded, then the clone will not be idempotent. Indeed, this method's purpose if simply to return a new instance with the same input parameters.
debug_one
Print an explanation of how x
is predicted.
Parameters
- x (dict)
Returns
typing.Union[str, NoneType]: A representation of the path followed by the tree to predict x
; None
if
draw
Draw the tree using the graphviz
library.
Since the tree is drawn without passing incoming samples, classification trees will show the majority class in their leaves, whereas regression trees will use the target mean.
Parameters
- max_depth (int) – defaults to
None
The maximum depth a tree can reach. IfNone
, the tree will grow indefinitely.
learn_one
Incrementally train the model with one sample.
Training tasks: * If the tree is empty, create a leaf node as the root. * If the tree is already initialized, find the corresponding leaf for the instance and update the leaf node statistics. * If growth is allowed and the number of instances that the leaf has observed between split attempts exceed the grace period then attempt to split.
Parameters
- x (dict)
- y (Dict[Hashable, numbers.Number])
- sample_weight (float) – defaults to
1.0
predict_one
Predict the target values for a given instance.
Parameters
- x (dict)
Returns
typing.Dict[typing.Hashable, numbers.Number]: dict
to_dataframe
Return a representation of the current tree structure organized in a pandas.DataFrame
object.
In case the tree is empty or it only contains a single node (a leaf), None
is returned.
Returns
df
References¶
-
Aljaž Osojnik, Panče Panov, and Sašo Džeroski. "Tree-based methods for online multi-target regression." Journal of Intelligent Information Systems 50.2 (2018): 315-339. ↩