FeatureHasher¶
Implements the hashing trick.
Each pair of (name, value) features is hashed into a random integer. A module operator is then used to make sure the hash is in a certain range. We use the Murmurhash implementation from scikit-learn.
Parameters¶
-
n_features – defaults to
1048576
The number by which each hash will be moduloed by.
-
seed (int) – defaults to
None
Set the seed to produce identical results.
Examples¶
>>> import river
>>> hasher = river.preprocessing.FeatureHasher(n_features=10, seed=42)
>>> X = [
... {'dog': 1, 'cat': 2, 'elephant': 4},
... {'dog': 2, 'run': 5}
... ]
>>> for x in X:
... print(hasher.transform_one(x))
Counter({1: 4, 9: 2, 8: 1})
Counter({4: 5, 8: 2})
Methods¶
learn_one
Update with a set of features x
.
A lot of transformers don't actually have to do anything during the learn_one
step because they are stateless. For this reason the default behavior of this function is to do nothing. Transformers that however do something during the learn_one
can override this method.
Parameters
- x (dict)
Returns
Transformer: self
transform_one
Transform a set of features x
.
Parameters
- x (dict)
Returns
dict: The transformed values.