K-Nearest Neighbors (KNN) for classification.
This works by storing a buffer with the
window_size most recent observations. A brute-force search is used to find the
n_neighbors nearest observations in the buffer to make a prediction. See the NearestNeighbors parent class for more details.
n_neighbors (int) – defaults to
The number of nearest neighbors to search for.
window_size (int) – defaults to
The maximum size of the window storing the last observed samples.
min_distance_keep (float) – defaults to
The minimum distance (similarity) to consider adding a point to the window. E.g., a value of 0.0 will add even exact duplicates. Default is 0.05 to add similar but not exactly the same points.
weighted (bool) – defaults to
Weight the contribution of each neighbor by it's inverse distance.
cleanup_every (int) – defaults to
This determines at which rate old classes are cleaned up. Classes that have been seen in the past but that are not present in the current window are dropped. Classes are never dropped when this is set to 0.
distance_func (river.neighbors.base.DistanceFunc) – defaults to
An optional distance function that should accept an a=, b=, and any custom set of kwargs. If not defined, the Minkowski distance is used with p=2 (Euclidean distance). See the example section for more details.
softmax (bool) – defaults to
Whether or not to use softmax normalization to normalize the neighbors contributions. Votes are divided by the total number of votes if this is
>>> from river import datasets >>> from river import evaluate >>> from river import metrics >>> from river import neighbors >>> from river import preprocessing >>> dataset = datasets.Phishing() >>> model = ( ... preprocessing.StandardScaler() | ... neighbors.KNNClassifier(window_size=50) ... ) >>> evaluate.progressive_val_score(dataset, model, metrics.Accuracy()) Accuracy: 84.55%
When defining a custom distance function you can rely on
functools.partial to set default
parameter values. For instance, let's use the Manhattan function instead of the default Euclidean distance:
>>> import functools >>> from river import utils >>> model = ( ... preprocessing.StandardScaler() | ... neighbors.KNNClassifier( ... window_size=50, ... distance_func=functools.partial(utils.math.minkowski_distance, p=1) ... ) ... ) >>> evaluate.progressive_val_score(dataset, model, metrics.Accuracy()) Accuracy: 86.87%
Clean up classes added to the window.
Classes that are added (and removed) from the window may no longer be valid. This method cleans up the window and and ensures only known classes are added, and we do not consider "None" a class. It is called every
cleanup_every step, or can be called manually.
Update the model with a set of features
x and a label
- x (dict)
- y (Union[bool, str, int])
Predict the label of a set of features
- x (dict)
typing.Union[bool, str, int, NoneType]: The predicted label.
Predict the probability of each label for a dictionary of features
- x (dict)
typing.Dict[typing.Union[bool, str, int], float]: A dictionary that associates a probability which each label.
Note that since the window is moving and we keep track of all classes that
are added at some point, a class might be returned in a result (with a
value of 0) if it is no longer in the window. You can call
model.clean_up_classes(), or set
cleanup_every to a non-zero value.