Skip to content

FeatureHasher

Implements the hashing trick.

Each pair of (name, value) features is hashed into a random integer. A module operator is then used to make sure the hash is in a certain range. We use the Murmurhash implementation from scikit-learn.

Parameters

  • n_features

    Default1048576

    The number by which each hash will be moduloed by.

  • seed

    Typeint | None

    DefaultNone

    Set the seed to produce identical results.

Examples

import river

hasher = river.preprocessing.FeatureHasher(n_features=10, seed=42)

X = [
    {'dog': 1, 'cat': 2, 'elephant': 4},
    {'dog': 2, 'run': 5}
]
for x in X:
    print(hasher.transform_one(x))
Counter({1: 4, 9: 2, 8: 1})
Counter({4: 5, 8: 2})

Methods

learn_one

Update with a set of features x.

A lot of transformers don't actually have to do anything during the learn_one step because they are stateless. For this reason the default behavior of this function is to do nothing. Transformers that however do something during the learn_one can override this method.

Parameters

  • x'dict'

transform_one

Transform a set of features x.

Parameters

  • x'dict'

Returns

dict: The transformed values.