Skip to content

NUnique

Approximate number of unique values counter.

This is basically an implementation of the HyperLogLog algorithm. Adapted from hypy. The code is a bit too terse but it will do for now.

Parameters

  • error_rate

    Default0.01

    Desired error rate. Memory usage is inversely proportional to this value.

  • seed

    Typeint | None

    DefaultNone

    Set the seed to produce identical results.

Attributes

  • n_bits (int)

  • n_buckets (int)

  • buckets (list)

Examples

import string
from river import stats

alphabet = string.ascii_lowercase
n_unique = stats.NUnique(error_rate=0.2, seed=42)

n_unique.update('a').get()
1

n_unique.update('b').get()
2

for letter in alphabet:
    n_unique = n_unique.update(letter)
n_unique.get()
31

Lowering the error_rate parameter will increase the precision.

n_unique = stats.NUnique(error_rate=0.01, seed=42)
for letter in alphabet:
    n_unique = n_unique.update(letter)
n_unique.get()
26

Methods

get

Return the current value of the statistic.

update

Update and return the called instance.

Parameters

  • x'numbers.Number'