NUnique¶
Approximate number of unique values counter.
This is basically an implementation of the HyperLogLog algorithm. Adapted from hypy
. The code is a bit too terse but it will do for now.
Parameters¶
-
error_rate
Default →
0.01
Desired error rate. Memory usage is inversely proportional to this value.
-
seed
Type → int | None
Default →
None
Set the seed to produce identical results.
Attributes¶
-
n_bits (int)
-
n_buckets (int)
-
buckets (list)
Examples¶
import string
from river import stats
alphabet = string.ascii_lowercase
n_unique = stats.NUnique(error_rate=0.2, seed=42)
n_unique.update('a')
n_unique.get()
1
n_unique.update('b')
n_unique.get()
2
for letter in alphabet:
n_unique.update(letter)
n_unique.get()
31
Lowering the error_rate
parameter will increase the precision.
n_unique = stats.NUnique(error_rate=0.01, seed=42)
for letter in alphabet:
n_unique.update(letter)
n_unique.get()
26
Methods¶
get
Return the current value of the statistic.
update
Update the called instance.
Parameters
- x — 'numbers.Number'