SelectKBest¶
Removes all but the \(k\) highest scoring features.
Parameters¶
-
similarity (river.stats.base.Bivariate)
-
k – defaults to
10
The number of features to keep.
Attributes¶
-
similarities (dict)
The similarity instances used for each feature.
-
leaderboard (dict)
The actual similarity measures.
Examples¶
>>> from pprint import pprint
>>> from river import feature_selection
>>> from river import stats
>>> from river import stream
>>> from sklearn import datasets
>>> X, y = datasets.make_regression(
... n_samples=100,
... n_features=10,
... n_informative=2,
... random_state=42
... )
>>> selector = feature_selection.SelectKBest(
... similarity=stats.PearsonCorr(),
... k=2
... )
>>> for xi, yi, in stream.iter_array(X, y):
... selector = selector.learn_one(xi, yi)
>>> pprint(selector.leaderboard)
Counter({9: 0.7898,
7: 0.5444,
8: 0.1062,
2: 0.0638,
4: 0.0538,
5: 0.0271,
1: -0.0312,
6: -0.0657,
3: -0.1501,
0: -0.1895})
>>> selector.transform_one(xi)
{7: -1.2795, 9: -1.8408}
Methods¶
learn_one
Update with a set of features x
and a target y
.
Parameters
- x (dict)
- y (Union[bool, str, int, numbers.Number])
Returns
SupervisedTransformer: self
transform_one
Transform a set of features x
.
Parameters
- x (dict)
Returns
dict: The transformed values.