NUnique¶
Approximate number of unique values counter.
This is basically an implementation of the HyperLogLog algorithm. Adapted from hypy. The code is a bit too terse but it will do for now.
Parameters¶
-
error_rate
Type →
floatDefault →
0.01Desired error rate. Memory usage is inversely proportional to this value.
-
seed
Type →
int | NoneDefault →
NoneSet the seed to produce identical results.
Attributes¶
-
n_bits (
int) -
n_buckets (
int) -
buckets (
list)
Examples¶
import string
from river import sketch
alphabet = string.ascii_lowercase
n_unique = sketch.NUnique(error_rate=0.2, seed=42)
n_unique.update('a')
n_unique.get()
1
n_unique.update('b')
n_unique.get()
2
for letter in alphabet:
n_unique.update(letter)
n_unique.get()
31
Lowering the error_rate parameter will increase the precision.
n_unique = sketch.NUnique(error_rate=0.01, seed=42)
for letter in alphabet:
n_unique.update(letter)
n_unique.get()
26