Skip to content

NUnique

Approximate number of unique values counter.

This is basically an implementation of the HyperLogLog algorithm. Adapted from hypy. The code is a bit too terse but it will do for now.

Parameters

  • error_rate – defaults to 0.01

    Desired error rate. Memory usage is inversely proportional to this value.

  • seed (int) – defaults to None

    Set the seed to produce identical results.

Attributes

  • n_bits (int)

  • n_buckets (int)

  • buckets (list)

Examples

>>> import string
>>> from river import stats

>>> alphabet = string.ascii_lowercase
>>> n_unique = stats.NUnique(error_rate=0.2, seed=42)

>>> n_unique.update('a').get()
1

>>> n_unique.update('b').get()
2

>>> for letter in alphabet:
...     n_unique = n_unique.update(letter)
>>> n_unique.get()
31

Lowering the error_rate parameter will increase the precision.

>>> n_unique = stats.NUnique(error_rate=0.01, seed=42)
>>> for letter in alphabet:
...     n_unique = n_unique.update(letter)
>>> n_unique.get()
26

Methods

get

Return the current value of the statistic.

update

Update and return the called instance.

Parameters

  • x (numbers.Number)

References