Skip to content

FeatureHasher

Implements the hashing trick.

Each pair of (name, value) features is hashed into a random integer. A module operator is then used to make sure the hash is in a certain range. We use the Murmurhash implementation from scikit-learn.

Parameters

  • n_features – defaults to 1048576

    The number by which each hash will be moduloed by.

  • seed (int) – defaults to None

    Set the seed to produce identical results.

Examples

>>> import river

>>> hasher = river.preprocessing.FeatureHasher(n_features=10, seed=42)

>>> X = [
...     {'dog': 1, 'cat': 2, 'elephant': 4},
...     {'dog': 2, 'run': 5}
... ]
>>> for x in X:
...     print(hasher.transform_one(x))
Counter({1: 4, 9: 2, 8: 1})
Counter({4: 5, 8: 2})

Methods

learn_one

Update with a set of features x.

A lot of transformers don't actually have to do anything during the learn_one step because they are stateless. For this reason the default behavior of this function is to do nothing. Transformers that however do something during the learn_one can override this method.

Parameters

  • x (dict)

Returns

Transformer: self

transform_one

Transform a set of features x.

Parameters

  • x (dict)

Returns

dict: The transformed values.

References