STREAMKMeans¶

STREAMKMeans

STREAMKMeans is an alternative version of the original algorithm STREAMLSEARCH proposed by O'Callaghan et al. 1, by replacing the k-medians using LSEARCH by the k-means algorithm.

However, instead of using the traditional k-means, which requires a total reclustering each time the temporary chunk of data points is full, the implementation of this algorithm uses an increamental k-means.

At first, the cluster centers are initialized with a KMeans instance. For a new point p:

• If the size of chunk is less than the maximum size allowed, add the new point to the temporary chunk.

• When the size of chunk reaches the maximum value size allowed

• A new incremental KMeans instance is created. The latter will process all points in the

temporary chunk. The centers of this new instance then become the new centers.

• All points are deleted from the temporary chunk so that new points can be added.
• When a prediction request arrives, the centers of the algorithm will be exactly the same as the centers of the original KMeans at the time of retrieval.

Parameters¶

• chunk_size

Default10

Maximum size allowed for the temporary data chunk.

• n_clusters

Default2

Number of clusters generated by the algorithm.

• kwargs

Other parameters passed to the incremental kmeans at cluster.KMeans.

Attributes¶

• centers

Cluster centers generated from running the incremental KMeans algorithm through centers of each chunk.

Examples¶

from river import cluster
from river import stream

X = [
[1, 0.5], [1, 0.625], [1, 0.75], [1, 1.125], [1, 1.5], [1, 1.75],
[4, 1.5], [4, 2.25], [4, 2.5], [4, 3], [4, 3.25], [4, 3.5]
]

streamkmeans = cluster.STREAMKMeans(chunk_size=3, n_clusters=2, halflife=0.5, sigma=1.5, seed=0)

for x, _ in stream.iter_array(X):
streamkmeans = streamkmeans.learn_one(x)

streamkmeans.predict_one({0: 1, 1: 0})

0


streamkmeans.predict_one({0: 5, 1: 2})

1


Methods¶

learn_one

Update the model with a set of features x.

Parameters

• x'dict'
• sample_weight — defaults to None

Returns

Clusterer: self

predict_one

Predicts the cluster number for a set of features x.

Parameters

• x'dict'
• sample_weight — defaults to None

Returns

int: A cluster number.

1. O'Callaghan et al. (2002). Streaming-data algorithms for high-quality clustering. In Proceedings 18th International Conference on Data Engineering, Feb 26 - March 1, San Jose, CA, USA. DOI: 10.1109/ICDE.2002.994785.