Skip to content

KNNClassifier

K-Nearest Neighbors (KNN) for classification.

This works by storing a buffer with the window_size most recent observations. A brute-force search is used to find the n_neighbors nearest observations in the buffer to make a prediction. See the NearestNeighbors parent class for more details.

Parameters

  • n_neighbors

    Typeint

    Default5

    The number of nearest neighbors to search for.

  • window_size

    Typeint

    Default1000

    The maximum size of the window storing the last observed samples.

  • min_distance_keep

    Typefloat

    Default0.0

    The minimum distance (similarity) to consider adding a point to the window. E.g., a value of 0.0 will add even exact duplicates. Default is 0.05 to add similar but not exactly the same points.

  • weighted

    Typebool

    DefaultTrue

    Weight the contribution of each neighbor by it's inverse distance.

  • cleanup_every

    Typeint

    Default0

    This determines at which rate old classes are cleaned up. Classes that have been seen in the past but that are not present in the current window are dropped. Classes are never dropped when this is set to 0.

  • distance_func

    TypeDistanceFunc | None

    DefaultNone

    An optional distance function that should accept an a=, b=, and any custom set of kwargs. If not defined, the Minkowski distance is used with p=2 (Euclidean distance). See the example section for more details.

  • softmax

    Typebool

    DefaultFalse

    Whether or not to use softmax normalization to normalize the neighbors contributions. Votes are divided by the total number of votes if this is False.

Examples

from river import datasets
from river import evaluate
from river import metrics
from river import neighbors
from river import preprocessing

dataset = datasets.Phishing()

model = (
    preprocessing.StandardScaler() |
    neighbors.KNNClassifier(window_size=50)
)

evaluate.progressive_val_score(dataset, model, metrics.Accuracy())
Accuracy: 84.55%

When defining a custom distance function you can rely on functools.partial to set default parameter values. For instance, let's use the Manhattan function instead of the default Euclidean distance:

import functools
from river import utils
model = (
    preprocessing.StandardScaler() |
    neighbors.KNNClassifier(
        window_size=50,
        distance_func=functools.partial(utils.math.minkowski_distance, p=1)
    )
)
evaluate.progressive_val_score(dataset, model, metrics.Accuracy())
Accuracy: 86.87%

Methods

clean_up_classes

Clean up classes added to the window.

Classes that are added (and removed) from the window may no longer be valid. This method cleans up the window and and ensures only known classes are added, and we do not consider "None" a class. It is called every cleanup_every step, or can be called manually.

learn_one

Update the model with a set of features x and a label y.

Parameters

  • x'dict'
  • y'base.typing.ClfTarget'

Returns

Classifier: self

predict_one

Predict the label of a set of features x.

Parameters

  • x'dict'
  • kwargs

Returns

base.typing.ClfTarget | None: The predicted label.

predict_proba_one

Predict the probability of each label for a dictionary of features x.

Parameters

  • x'dict'

Returns

dict[base.typing.ClfTarget, float]: A dictionary that associates a probability which each label.

Notes

Note that since the window is moving and we keep track of all classes that are added at some point, a class might be returned in a result (with a value of 0) if it is no longer in the window. You can call model.clean_up_classes(), or set cleanup_every to a non-zero value.