RandomTreesEmbedding¶
Embed samples according to the leaves of random trees.
This transformer builds an ensemble of totally random trees and encodes each sample with the leaf reached in every tree. The output is sparse: exactly one binary feature is active per tree.
This is the online counterpart of feeding a linear model with random-tree leaf indicators. By default, this implementation assumes that each feature has values comprised between 0 and 1. If this is not the case, then you can manually specify the limits via the limits argument. If you do not know the limits in advance, then you can use a preprocessing.MinMaxScaler as an initial preprocessing step.
The trees are built lazily from the first sample that is observed, either via transform_one or learn_one. If new features appear later in the stream, then the forest is rebuilt so that future splits may use them. Unless specified in limits, newly observed features are also assumed to lie in the range [0, 1].
Parameters¶
-
n_trees
Default →
10Number of trees in the ensemble.
-
height
Default →
8Height of each tree. A tree of height
hcontains2 ** hleaves. -
limits
Type → dict[base.typing.FeatureName, tuple[float, float]] | None
Default →
NoneSpecifies the range of each feature. By default each feature is assumed to be in range
[0, 1]. If the feature ranges are unknown beforehand, then use apreprocessing.MinMaxScalerupstream. -
seed
Type → int | None
Default →
NoneRandom seed for reproducibility.
Examples¶
from river import feature_extraction as fx
from river import linear_model as lm
from river import optim
from river import preprocessing
embedding = fx.RandomTreesEmbedding(n_trees=3, height=2, seed=42)
len(embedding.transform_one({'x': 0.3, 'y': 0.7}))
3
model = (
preprocessing.MinMaxScaler() |
fx.RandomTreesEmbedding(n_trees=5, height=3, seed=42) |
lm.LogisticRegression(optimizer=optim.SGD(0.1))
)
Methods¶
learn_one
Update with a set of features x.
A lot of transformers don't actually have to do anything during the learn_one step because they are stateless. For this reason the default behavior of this function is to do nothing. Transformers that however do something during the learn_one can override this method.
Parameters
- x — 'dict[base.typing.FeatureName, Any]'
transform_one
Transform a set of features x.
Parameters
- x — 'dict[base.typing.FeatureName, Any]'
Returns
dict[base.typing.FeatureName, Any]: The transformed values.