KNNClassifier¶
K-Nearest Neighbors (KNN) for classification.
This works by storing a buffer with the window_size
most recent observations. A brute-force search is used to find the n_neighbors
nearest observations in the buffer to make a prediction. See the NearestNeighbors parent class for model details.
Parameters¶
-
n_neighbors (int) – defaults to
5
The number of nearest neighbors to search for.
-
window_size (int) – defaults to
1000
The maximum size of the window storing the last observed samples.
-
min_distance_keep (float) – defaults to
0.0
The minimum distance (similarity) to consider adding a point to the window. E.g., a value of 0.0 will add even exact duplicates. Default is 0.05 to add similar but not exactly the same points.
-
weighted (bool) – defaults to
True
Weight the contribution of each neighbor by it's inverse distance.
-
cleanup_every (int) – defaults to
0
This determines at which rate old classes are cleaned up. Classes that have been seen in the past but that are not present in the current window are dropped. Classes are never dropped when this is set to 0.
-
distance_func (Callable[[Any, Any], float]) – defaults to
None
An optional distance function that should accept an a=, b=, and any custom set of kwargs (defined in distance_func_kwargs). If not defined, the default Minkowski distance is used.
-
softmax (bool) – defaults to
False
Whether or not to use softmax normalization to normalize the neighbors contributions. Votes are divided by the total number of votes if this is
False
.
Examples¶
>>> from river import datasets, neighbors, preprocessing
>>> from river import evaluate, metrics
>>> dataset = datasets.Phishing()
>>> model = (
... preprocessing.StandardScaler() |
... neighbors.KNNClassifier()
... )
>>> for x, y in dataset.take(100):
... model = model.learn_one(x, y)
>>> for x, y in dataset.take(1):
... model.predict_one(x)
True
Methods¶
clean_up_classes
Clean up classes added to the window.
Classes that are added (and removed) from the window may no longer be valid. This method cleans up the window and and ensures only known classes are added, and we do not consider "None" a class. It is called every cleanup_every
step, or can be called manually.
clone
Return a fresh estimator with the same parameters.
The clone has the same parameters but has not been updated with any data. This works by looking at the parameters from the class signature. Each parameter is either - recursively cloned if it's a River classes. - deep-copied via copy.deepcopy
if not. If the calling object is stochastic (i.e. it accepts a seed parameter) and has not been seeded, then the clone will not be idempotent. Indeed, this method's purpose if simply to return a new instance with the same input parameters.
learn_one
Update the model with a set of features x
and a label y
.
Parameters
- x
- y
Returns
self
predict_one
Predict the label of a set of features x
.
Parameters
- x (dict)
Returns
typing.Union[bool, str, int]: The predicted label.
predict_proba_one
Predict the probability of each label for a dictionary of features x
.
Parameters
- x
Returns
A dictionary that associates a probability which each label.
Notes¶
See the NearestNeighbors documentation for details about the base model,
along with KNNBase for an example of providing your own distance function.
Note that since the window is moving and we keep track of all classes that
are added at some point, a class might be returned in a result (with a
value of 0) if it is no longer in the window. You can call
model.clean_up_classes(), or set cleanup_every
to a non-zero value.