Skip to content

PoissonInclusion

Randomly selects features with an inclusion trial.

When a new feature is encountered, it is selected with probability p. The number of times a feature needs to beseen before it is added to the model follows a geometric distribution with expected value 1 / p. This feature selection method is meant to be used when you have a very large amount of sparse features.

Parameters

  • p (float)

    Probability of including a feature the first time it is encountered.

  • seed (int) – defaults to None

    Random seed value used for reproducibility.

Examples

>>> from river import datasets
>>> from river import feature_selection
>>> from river import stream

>>> selector = feature_selection.PoissonInclusion(p=0.1, seed=42)

>>> dataset = iter(datasets.TrumpApproval())

>>> feature_names = next(dataset)[0].keys()
>>> n = 0

>>> while True:
...     x, y = next(dataset)
...     xt = selector.transform_one(x)
...     if xt.keys() == feature_names:
...         break
...     n += 1

>>> n
12

Methods

learn_one

Update with a set of features x.

A lot of transformers don't actually have to do anything during the learn_one step because they are stateless. For this reason the default behavior of this function is to do nothing. Transformers that however do something during the learn_one can override this method.

Parameters

  • x (dict)

Returns

Transformer: self

transform_one

Transform a set of features x.

Parameters

  • x (dict)

Returns

dict: The transformed values.

References