Mv¶

Mv artificial dataset.

Artificial dataset composed of both nominal and numeric features, whose features present co-dependencies. Originally described in ¹.

The features are generated using the following expressions:

$x_{1}$ : uniformly distributed over [-5, 5].
$x_{2}$ : uniformly distributed over [-15, -10].
$x_{3}$ :
- if $x_{1} > 0$ , $x_{3} \leftarrow$ 'green'
- else $x_{3} \leftarrow$ 'red' with probability $0.4$ and $x_{3} \leftarrow$ 'brown' with probability $0.6$ .
$x_{4}$ :
- if $x_{3} =$ 'green', $x_{4} \leftarrow x_{1} + 2 x_{2}$
- else $x_{4} = \frac{x_{1}}{2}$ with probability $0.3$ and $x_{4} = \frac{x_{2}}{2}$ with probability $0.7$ .
$x_{5}$ : uniformly distributed over [-1, 1].
$x_{6} \leftarrow x_{4} \times ϵ$ , where $ϵ$ is uniformly distributed over [0, 5].
$x_{7}$ : 'yes' with probability $0.3$ , and 'no' with probability $0.7$ .
$x_{8}$ : 'normal' if $x_{5} < 0.5$ else 'large'.
$x_{9}$ : uniformly distributed over [100, 500].
$x_{10}$ : uniformly distributed integer over the interval [1000, 1200].

The target value is generated using the following rules:

if $x_{2} > 2$ , $y \leftarrow 35 - 0.5 x_{4}$
else if $- 2 \leq x_{4} \leq 2$ , $y \leftarrow 10 - 2 x_{1}$
else if $x_{7} =$ 'yes', $y \leftarrow 3 - \frac{x_{1}}{x_{4}}$
else if $x_{8} =$ 'normal', $y \leftarrow x_{6} + x_{1}$
else $y \leftarrow \frac{x_{1}}{2}$ .

Parameters¶

seed (int) – defaults to None

Random seed number used for reproducibility.

Attributes¶

desc

Return the description from the docstring.

Examples¶

>>> from river import synth

>>> dataset = synth.Mv(seed=42)

>>> for x, y in dataset.take(5):
...     print(list(x.values()), y)
[1.39, -14.87, 'green', -28.35, -0.44, -31.64, 'no', 'normal', 370.67, 1178.43] -30.25
[-4.13, -12.89, 'red', -2.06, 0.01, -0.27, 'yes', 'normal', 359.95, 1108.98] 1.00
[-2.79, -12.05, 'brown', -1.39, 0.61, -4.87, 'no', 'large', 162.19, 1191.44] 15.59
[-1.63, -14.53, 'red', -7.26, 0.20, -29.33, 'no', 'normal', 314.49, 1194.62] -30.96
[-1.21, -12.23, 'brown', -6.11, 0.72, -17.66, 'no', 'large', 118.32, 1045.57] -0.60

Methods¶

take

Iterate over the k samples.

Parameters

k (int)

References¶

Mv in Luís Torgo regression datasets ↩