Histogram¶

Streaming histogram.

Parameters¶

max_bins

Default → 256

Maximal number of bins.

Attributes¶

n

Total number of seen values.

Examples¶

from river import sketch
import numpy as np

np.random.seed(42)

values = np.hstack((
    np.random.normal(-3, 1, 1000),
    np.random.normal(3, 1, 1000),
))

hist = sketch.Histogram(max_bins=15)

for x in values:
    hist = hist.update(x)

for bin in hist:
    print(bin)

[-6.24127, -6.24127]: 1
[-5.69689, -5.19881]: 8
[-5.12390, -4.43014]: 57
[-4.42475, -3.72574]: 158
[-3.71984, -3.01642]: 262
[-3.01350, -2.50668]: 206
[-2.50329, -0.81020]: 294
[-0.80954, 0.29677]: 19
[0.40896, 0.82733]: 7
[0.84661, 1.25147]: 24
[1.26029, 2.30758]: 178
[2.31081, 3.05701]: 284
[3.05963, 3.69695]: 242
[3.69822, 5.64434]: 258
[6.13775, 6.19311]: 2

Methods¶

append

S.append(value) -- append value to the end of the sequence

Parameters

item

cdf

Cumulative distribution function.

Parameters

x

clear

S.clear() -> None -- remove all items from S

copy

count

S.count(value) -> integer -- return number of occurrences of value

Parameters

item

extend

S.extend(iterable) -- extend sequence by appending elements from the iterable

Parameters

other

index

S.index(value, [start, [stop]]) -> integer -- return first index of value. Raises ValueError if the value is not present.

Supporting start and stop arguments is optional, but recommended.

Parameters

item
args

insert

S.insert(index, value) -- insert value before index

Parameters

i
item

iter_cdf

Yields CDF values for a sorted iterable of values.

This is faster than calling cdf with many values.

Parameters

X
verbose — defaults to False

pop

S.pop([index]) -> item -- remove and return item at index (default last). Raise IndexError if list is empty or index is out of range.

Parameters

i — defaults to -1

remove

S.remove(value) -- remove first occurrence of value. Raise ValueError if the value is not present.

Parameters

item

reverse

S.reverse() -- reverse IN PLACE

sort

update