FeatureHasher¶

Implements the hashing trick.

Each pair of (name, value) features is hashed into a random integer. A module operator is then used to make sure the hash is in a certain range. We use the Murmurhash implementation from scikit-learn.

Parameters¶

n_features – defaults to 1048576

The number by which each hash will be moduloed by.
seed (int) – defaults to None

Set the seed to produce identical results.

Examples¶

>>> import river

>>> hasher = river.preprocessing.FeatureHasher(n_features=10, seed=42)

>>> X = [
...     {'dog': 1, 'cat': 2, 'elephant': 4},
...     {'dog': 2, 'run': 5}
... ]
>>> for x in X:
...     print(hasher.transform_one(x))
Counter({1: 4, 9: 2, 8: 1})
Counter({4: 5, 8: 2})

Methods¶

learn_one

Update with a set of features x.

A lot of transformers don't actually have to do anything during the learn_one step because they are stateless. For this reason the default behavior of this function is to do nothing. Transformers that however do something during the learn_one can override this method.

Parameters

x (dict)

Returns

Transformer: self

transform_one

Transform a set of features x.

Parameters

x (dict)

Returns

dict: The transformed values.

References¶

Wikipedia article on feature vectorization using the hashing trick ↩