Skip to content


Stochastic Gradient Tree1 for binary classification.

Binary decision tree classifier that minimizes the binary cross-entropy to guide its growth.

Stochastic Gradient Trees (SGT) directly minimize a loss function to guide tree growth and update their predictions. Thus, they differ from other incrementally tree learners that do not directly optimize the loss, but data impurity-related heuristics.


  • delta (float) – defaults to 1e-07

    Define the significance level of the F-tests performed to decide upon creating splits or updating predictions.

  • grace_period (int) – defaults to 200

    Interval between split attempts or prediction updates.

  • init_pred (float) – defaults to 0.0

    Initial value predicted by the tree.

  • max_depth (Union[int, NoneType]) – defaults to None

    The maximum depth the tree might reach. If set to None, the trees will grow indefinitely.

  • lambda_value (float) – defaults to 0.1

    Positive float value used to impose a penalty over the tree's predictions and force them to become smaller. The greater the lambda value, the more constrained are the predictions.

  • gamma (float) – defaults to 1.0

    Positive float value used to impose a penalty over the tree's splits and force them to be avoided when possible. The greater the gamma value, the smaller the chance of a split occurring.

  • nominal_attributes (Union[List, NoneType]) – defaults to None

    List with identifiers of the nominal attributes. If None, all features containing numbers are assumed to be numeric.

  • feature_quantizer (river.tree.splitter.base.Quantizer) – defaults to None

    The algorithm used to quantize numeric features. Either a static quantizer (as in the original implementation) or a dynamic quantizer can be used. The correct choice and setup of the feature quantizer is a crucial step to determine the performance of SGTs. Feature quantizers are akin to the attribute observers used in Hoeffding Trees. By default, an instance of tree.splitter.StaticQuantizer (with default parameters) is used if this parameter is not set.


  • height

  • n_branches

  • n_leaves

  • n_node_updates

  • n_nodes

  • n_observations

  • n_splits


>>> from river import datasets
>>> from river import evaluate
>>> from river import metrics
>>> from river import tree

>>> dataset = datasets.Phishing()
>>> model = tree.SGTClassifier(
...     feature_quantizer=tree.splitter.StaticQuantizer(
...         n_bins=32, warm_start=10
...     )
... )
>>> metric = metrics.Accuracy()

>>> evaluate.progressive_val_score(dataset, model, metric)
Accuracy: 82.24%




Return a fresh estimator with the same parameters.

The clone has the same parameters but has not been updated with any data. This works by looking at the parameters from the class signature. Each parameter is either - recursively cloned if it's a River classes. - deep-copied via copy.deepcopy if not. If the calling object is stochastic (i.e. it accepts a seed parameter) and has not been seeded, then the clone will not be idempotent. Indeed, this method's purpose if simply to return a new instance with the same input parameters.


Update the model with a set of features x and a label y.


  • x (dict)
  • y (Union[bool, str, int])
  • w – defaults to 1.0


Classifier: self


Predict the label of a set of features x.


  • x (dict)


typing.Union[bool, str, int]: The predicted label.


Predict the probability of each label for a dictionary of features x.


  • x (dict)


typing.Dict[typing.Union[bool, str, int], float]: A dictionary that associates a probability which each label.

  1. Gouk, H., Pfahringer, B., & Frank, E. (2019, October). Stochastic Gradient Trees. In Asian Conference on Machine Learning (pp. 1094-1109).