Skip to content

Mv

Mv artificial dataset.

Artificial dataset composed of both nominal and numeric features, whose features present co-dependencies. Originally described in 1.

The features are generated using the following expressions:

  • \(x_1\): uniformly distributed over [-5, 5].

  • \(x_2\): uniformly distributed over [-15, -10].

  • \(x_3\):

    • if \(x_1 > 0\), \(x_3 \leftarrow\) 'green'

    • else \(x_3 \leftarrow\) 'red' with probability \(0.4\) and \(x_3 \leftarrow\) 'brown' with probability \(0.6\).

  • \(x_4\):

    • if \(x_3 =\) 'green', \(x_4 \leftarrow x_1 + 2 x_2\)

    • else \(x_4 = \frac{x_1}{2}\) with probability \(0.3\) and \(x_4 = \frac{x_2}{2}\) with probability \(0.7\).

  • \(x_5\): uniformly distributed over [-1, 1].

  • \(x_6 \leftarrow x_4 \times \epsilon\), where \(\epsilon\) is uniformly distributed over [0, 5].

  • \(x_7\): 'yes' with probability \(0.3\), and 'no' with probability \(0.7\).

  • \(x_8\): 'normal' if \(x_5 < 0.5\) else 'large'.

  • \(x_9\): uniformly distributed over [100, 500].

  • \(x_{10}\): uniformly distributed integer over the interval [1000, 1200].

The target value is generated using the following rules:

  • if \(x_2 > 2\), \(y \leftarrow 35 - 0.5 x_4\)

  • else if \(-2 \le x_4 \le 2\), \(y \leftarrow 10 - 2 x_1\)

  • else if \(x_7 =\) 'yes', \(y \leftarrow 3 - \frac{x_1}{x_4}\)

  • else if \(x_8 =\) 'normal', \(y \leftarrow x_6 + x_1\)

  • else \(y \leftarrow \frac{x_1}{2}\).

Parameters

  • seed (int) – defaults to None

    Random seed number used for reproducibility.

Attributes

  • desc

    Return the description from the docstring.

Examples

>>> from river import synth

>>> dataset = synth.Mv(seed=42)

>>> for x, y in dataset.take(5):
...     print(list(x.values()), y)
[1.39, -14.87, 'green', -28.35, -0.44, -31.64, 'no', 'normal', 370.67, 1178.43] -30.25
[-4.13, -12.89, 'red', -2.06, 0.01, -0.27, 'yes', 'normal', 359.95, 1108.98] 1.00
[-2.79, -12.05, 'brown', -1.39, 0.61, -4.87, 'no', 'large', 162.19, 1191.44] 15.59
[-1.63, -14.53, 'red', -7.26, 0.20, -29.33, 'no', 'normal', 314.49, 1194.62] -30.96
[-1.21, -12.23, 'brown', -6.11, 0.72, -17.66, 'no', 'large', 118.32, 1045.57] -0.60

Methods

take

Iterate over the k samples.

Parameters

  • k (int)

References