FeatureHasher¶
Implements the hashing trick.
Each pair of (name, value) features is hashed into a random integer. A module operator is then used to make sure the hash is in a certain range. We use the Murmurhash implementation from scikit-learn.
Parameters¶
-
n_features
Default →
1048576
The number by which each hash will be moduloed by.
-
seed
Type → int | None
Default →
None
Set the seed to produce identical results.
Examples¶
import river
hasher = river.preprocessing.FeatureHasher(n_features=10, seed=42)
X = [
{'dog': 1, 'cat': 2, 'elephant': 4},
{'dog': 2, 'run': 5}
]
for x in X:
print(hasher.transform_one(x))
Counter({1: 4, 9: 2, 8: 1})
Counter({4: 5, 8: 2})
Methods¶
learn_one
Update with a set of features x
.
A lot of transformers don't actually have to do anything during the learn_one
step because they are stateless. For this reason the default behavior of this function is to do nothing. Transformers that however do something during the learn_one
can override this method.
Parameters
- x — 'dict'
transform_one
Transform a set of features x
.
Parameters
- x — 'dict'
Returns
dict: The transformed values.