SelectKBest¶
Removes all but the \(k\) highest scoring features.
Parameters¶
-
similarity
Type → stats.base.Bivariate
-
k
Default →
10
The number of features to keep.
Attributes¶
-
similarities (dict)
The similarity instances used for each feature.
-
leaderboard (dict)
The actual similarity measures.
Examples¶
from pprint import pprint
from river import feature_selection
from river import stats
from river import stream
from sklearn import datasets
X, y = datasets.make_regression(
n_samples=100,
n_features=10,
n_informative=2,
random_state=42
)
selector = feature_selection.SelectKBest(
similarity=stats.PearsonCorr(),
k=2
)
for xi, yi, in stream.iter_array(X, y):
selector.learn_one(xi, yi)
pprint(selector.leaderboard)
Counter({9: 0.7898,
7: 0.5444,
8: 0.1062,
2: 0.0638,
4: 0.0538,
5: 0.0271,
1: -0.0312,
6: -0.0657,
3: -0.1501,
0: -0.1895})
selector.transform_one(xi)
{7: -1.2795, 9: -1.8408}
Methods¶
learn_one
Update with a set of features x
and a target y
.
Parameters
- x — 'dict'
- y — 'base.typing.Target'
transform_one
Transform a set of features x
.
Parameters
- x — 'dict'
Returns
dict: The transformed values.