EmpiricalCovariance¶
Empirical covariance matrix.
Parameters¶
-
ddof
Default →
1
Delta Degrees of Freedom.
Attributes¶
- matrix
Examples¶
import numpy as np
import pandas as pd
from river import covariance
np.random.seed(42)
X = pd.DataFrame(np.random.random((8, 3)), columns=["red", "green", "blue"])
X
red green blue
0 0.374540 0.950714 0.731994
1 0.598658 0.156019 0.155995
2 0.058084 0.866176 0.601115
3 0.708073 0.020584 0.969910
4 0.832443 0.212339 0.181825
5 0.183405 0.304242 0.524756
6 0.431945 0.291229 0.611853
7 0.139494 0.292145 0.366362
cov = covariance.EmpiricalCovariance()
for x in X.to_dict(orient="records"):
cov = cov.update(x)
cov
blue green red
blue 0.076 0.020 -0.010
green 0.020 0.113 -0.053
red -0.010 -0.053 0.079
There is also an update_many
method to process mini-batches. The results are identical.
cov = covariance.EmpiricalCovariance()
cov = cov.update_many(X)
cov
blue green red
blue 0.076 0.020 -0.010
green 0.020 0.113 -0.053
red -0.010 -0.053 0.079
The covariances are stored in a dictionary, meaning any one of them can be accessed as such:
cov["blue", "green"]
Cov: 0.020292
Diagonal entries are variances:
cov["blue", "blue"]
Var: 0.076119
Start from a state:
n = 8
mean = {'red': 0.416, 'green': 0.387, 'blue': 0.518}
cov_ = {('red', 'red'): 0.079,
('red', 'green'): -0.053,
('red', 'blue'): -0.010,
('green', 'green'): 0.113,
('green', 'blue'): 0.020,
('blue', 'blue'): 0.076}
cov = covariance.EmpiricalCovariance._from_state(
n=n, mean=mean, cov=cov_, ddof=1)
cov
blue green red
blue 0.076 0.020 -0.010
green 0.020 0.113 -0.053
red -0.010 -0.053 0.079
Methods¶
revert
Downdate with a single sample.
Parameters
- x — 'dict'
update
Update with a single sample.
Parameters
- x — 'dict'
update_many
Update with a dataframe of samples.
Parameters
- X — 'pd.DataFrame'