MultivariateGaussian¶
Multivariate normal distribution with parameters mu and var.
Parameters¶
-
seed
Default →
None
Random number generator seed for reproducibility.
Attributes¶
-
mode
The most likely value in the distribution.
-
mu
The mean value of the distribution.
-
n_samples
The number of observed samples.
-
sigma
The standard deviation of the distribution.
-
var
The variance of the distribution.
Examples¶
import numpy as np
import pandas as pd
from river import proba
np.random.seed(42)
X = pd.DataFrame(
np.random.random((8, 3)),
columns=["red", "green", "blue"]
)
X
red green blue
0 0.374540 0.950714 0.731994
1 0.598658 0.156019 0.155995
2 0.058084 0.866176 0.601115
3 0.708073 0.020584 0.969910
4 0.832443 0.212339 0.181825
5 0.183405 0.304242 0.524756
6 0.431945 0.291229 0.611853
7 0.139494 0.292145 0.366362
p = proba.MultivariateGaussian(seed=42)
p.n_samples
0.0
for x in X.to_dict(orient="records"):
p = p.update(x)
p.var
blue green red
blue 0.076119 0.020292 -0.010128
green 0.020292 0.112931 -0.053268
red -0.010128 -0.053268 0.078961
Retrieving current state in nice format is simple
p
𝒩(
μ=(0.518, 0.387, 0.416),
σ^2=(
[ 0.076 0.020 -0.010]
[ 0.020 0.113 -0.053]
[-0.010 -0.053 0.079]
)
)
To retrieve number of samples and mode:
p.n_samples
8.0
p.mode
{'blue': 0.5179..., 'green': 0.3866..., 'red': 0.4158...}
To retrieve the PDF and CDF:
p(x)
0.97967...
p.cdf(x)
0.00787...
To sample data from distribution:
p.sample()
{'blue': -0.179..., 'green': -0.051..., 'red': 0.376...}
MultivariateGaussian works with utils.Rolling
:
from river import utils
p = utils.Rolling(MultivariateGaussian(), window_size=5)
for x in X.to_dict(orient="records"):
p = p.update(x)
p.var
blue green red
blue 0.087062 -0.022873 0.007765
green -0.022873 0.014279 -0.025181
red 0.007765 -0.025181 0.095066
MultivariateGaussian works with utils.TimeRolling
:
from datetime import datetime as dt, timedelta as td
X.index = [dt(2023, 3, 28, 0, 0, 0) + td(seconds=x) for x in range(8)]
p = utils.TimeRolling(MultivariateGaussian(), period=td(seconds=5))
for t, x in X.iterrows():
p = p.update(x.to_dict(), t=t)
p.var
blue green red
blue 0.087062 -0.022873 0.007765
green -0.022873 0.014279 -0.025181
red 0.007765 -0.025181 0.095066
Variance on diagonal is consistent with proba.Gaussian
.
multi = proba.MultivariateGaussian()
single = proba.Gaussian()
for x in X.to_dict(orient='records'):
multi = multi.update(x)
single = single.update(x['blue'])
multi.mu['blue'] == single.mu
True
multi.sigma['blue']['blue'] == single.sigma
True
Methods¶
call
PDF(x) method.
Parameters
- x — 'dict[str, float]'
cdf
Cumulative density function, i.e. P(X <= x).
Parameters
- x — 'dict[str, float]'
revert
Reverts the parameters of the distribution for a given observation.
Parameters
- x — 'dict[str, float]'
sample
Sample a random value from the distribution.
update
Updates the parameters of the distribution given a new observation.
Parameters
- x — 'dict[str, float]'