Skip to content

KolmogorovSmirnov

Incremental Kolmogorov-Smirnov statistics.

The two-sample Kolmogorov-Smirnov test quantifies the distance between the empirical functions of two samples, with the null distribution of this statistic is calculated under the null hypothesis that the samples are drawn from the same distribution. The formula can be described as

\[ D_{n, m} = \sup_x \| F_{1, n}(x) - F_{2, m}(x) \|. \]

This implementation is the incremental version of the previously mentioned statistics, with the change being in the ability to insert and remove an observation thorugh time. This can be done using a randomized tree called Treap (or Cartesian Tree) 2 with bulk operation and lazy propagation.

The implemented algorithm is able to perform the insertion and removal operations in O(logN) with high probability and calculate the Kolmogorov-Smirnov test in O(1), where N is the number of sample observations. This is a significant improvement compared to the O(N logN) cost of non-incremental implementation.

This implementation also supports the calculation of the Kuiper statistics. Different from the orginial Kolmogorov-Smirnov statistics, Kuiper's test 3 calculates the sum of the absolute sizes of the most positive and most negative differences between the two cumulative distribution functions taken into account. As such, Kuiper's test is very sensitive in the tails as at the median.

Last but not least, this implementation is also based on the original implementation within the supplementary material of the authors of paper 1, at the following Github repository.

Parameters

  • statistic

    Default โ†’ ks

    The method used to calculate the statistic, can be either "ks" or "kuiper". The default value is set as "ks".

Examples

import numpy as np
from river import stats

stream_a = [1, 1, 2, 2, 3, 3, 4, 4]
stream_b = [1, 1, 1, 1, 2, 2, 2, 2]

incremental_ks = stats.KolmogorovSmirnov(statistic="ks")
for a, b in zip(stream_a, stream_b):
    incremental_ks.update(a, b)

incremental_ks
KolmogorovSmirnov: 0.5

incremental_ks.n_samples
8

Methods

get

Return the current value of the statistic.

revert
update

Update and return the called instance.

Parameters

  • x
  • y


  1. dos Reis, D.M. et al. (2016) โ€˜Fast unsupervised online drift detection using incremental Kolmogorov-Smirnov testโ€™, Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. doi:10.1145/2939672.2939836. 

  2. C. R. Aragon and R. G. Seidel. Randomized search trees. In FOCS, pages 540โ€“545. IEEE, 1989. 

  3. Kuiper, N. H. (1960). "Tests concerning random points on a circle". Proceedings of the Koninklijke Nederlandse Akademie van Wetenschappen, Series A. 63: 38โ€“47.