Mv¶
Mv artificial dataset.
Artificial dataset composed of both nominal and numeric features, whose features present co-dependencies. Originally described in 1.
The features are generated using the following expressions:
-
\(x_1\): uniformly distributed over
[-5, 5]
. -
\(x_2\): uniformly distributed over
[-15, -10]
. -
\(x_3\):
-
if \(x_1 > 0\), \(x_3 \leftarrow\)
'green'
-
else \(x_3 \leftarrow\)
'red'
with probability \(0.4\) and \(x_3 \leftarrow\)'brown'
with probability \(0.6\).
-
-
\(x_4\):
-
if \(x_3 =\)
'green'
, \(x_4 \leftarrow x_1 + 2 x_2\) -
else \(x_4 = \frac{x_1}{2}\) with probability \(0.3\) and \(x_4 = \frac{x_2}{2}\) with probability \(0.7\).
-
-
\(x_5\): uniformly distributed over
[-1, 1]
. -
\(x_6 \leftarrow x_4 \times \epsilon\), where \(\epsilon\) is uniformly distributed over
[0, 5]
. -
\(x_7\):
'yes'
with probability \(0.3\), and'no'
with probability \(0.7\). -
\(x_8\):
'normal'
if \(x_5 < 0.5\) else'large'
. -
\(x_9\): uniformly distributed over
[100, 500]
. -
\(x_{10}\): uniformly distributed integer over the interval
[1000, 1200]
.
The target value is generated using the following rules:
-
if \(x_2 > 2\), \(y \leftarrow 35 - 0.5 x_4\)
-
else if \(-2 \le x_4 \le 2\), \(y \leftarrow 10 - 2 x_1\)
-
else if \(x_7 =\)
'yes'
, \(y \leftarrow 3 - \frac{x_1}{x_4}\) -
else if \(x_8 =\)
'normal'
, \(y \leftarrow x_6 + x_1\) -
else \(y \leftarrow \frac{x_1}{2}\).
Parameters¶
-
seed (int) – defaults to
None
Random seed number used for reproducibility.
Attributes¶
-
desc
Return the description from the docstring.
Examples¶
>>> from river import synth
>>> dataset = synth.Mv(seed=42)
>>> for x, y in dataset.take(5):
... print(list(x.values()), y)
[1.39, -14.87, 'green', -28.35, -0.44, -31.64, 'no', 'normal', 370.67, 1178.43] -30.25
[-4.13, -12.89, 'red', -2.06, 0.01, -0.27, 'yes', 'normal', 359.95, 1108.98] 1.00
[-2.79, -12.05, 'brown', -1.39, 0.61, -4.87, 'no', 'large', 162.19, 1191.44] 15.59
[-1.63, -14.53, 'red', -7.26, 0.20, -29.33, 'no', 'normal', 314.49, 1194.62] -30.96
[-1.21, -12.23, 'brown', -6.11, 0.72, -17.66, 'no', 'large', 118.32, 1045.57] -0.60
Methods¶
take
Iterate over the k samples.
Parameters
- k (int)