NUnique¶
Approximate number of unique values counter.
This is basically an implementation of the HyperLogLog algorithm. Adapted from hypy
. The code is a bit too terse but it will do for now.
Parameters¶
-
error_rate – defaults to
0.01
Desired error rate. Memory usage is inversely proportional to this value.
-
seed (int) – defaults to
None
Set the seed to produce identical results.
Attributes¶
-
n_bits (int)
-
n_buckets (int)
-
buckets (list)
Examples¶
>>> import string
>>> from river import stats
>>> alphabet = string.ascii_lowercase
>>> n_unique = stats.NUnique(error_rate=0.2, seed=42)
>>> n_unique.update('a').get()
1
>>> n_unique.update('b').get()
2
>>> for letter in alphabet:
... n_unique = n_unique.update(letter)
>>> n_unique.get()
31
Lowering the error_rate
parameter will increase the precision.
>>> n_unique = stats.NUnique(error_rate=0.01, seed=42)
>>> for letter in alphabet:
... n_unique = n_unique.update(letter)
>>> n_unique.get()
26
Methods¶
get
Return the current value of the statistic.
update
Update and return the called instance.
Parameters
- x (numbers.Number)