Skip to content

Mv

Mv artificial dataset.

Artificial dataset composed of both nominal and numeric features, whose features present co-dependencies. Originally described in 1.

The features are generated using the following expressions:

  • x1: uniformly distributed over [-5, 5].

  • x2: uniformly distributed over [-15, -10].

  • x3:

    • if x1>0, x3 'green'

    • else x3 'red' with probability 0.4 and x3 'brown' with probability 0.6.

  • x4:

    • if x3= 'green', x4x1+2x2

    • else x4=x12 with probability 0.3 and x4=x22 with probability 0.7.

  • x5: uniformly distributed over [-1, 1].

  • x6x4×ϵ, where ϵ is uniformly distributed over [0, 5].

  • x7: 'yes' with probability 0.3, and 'no' with probability 0.7.

  • x8: 'normal' if x5<0.5 else 'large'.

  • x9: uniformly distributed over [100, 500].

  • x10: uniformly distributed integer over the interval [1000, 1200].

The target value is generated using the following rules:

  • if x2>2, y350.5x4

  • else if 2x42, y102x1

  • else if x7= 'yes', y3x1x4

  • else if x8= 'normal', yx6+x1

  • else yx12.

Parameters

  • seed (int) – defaults to None

    Random seed number used for reproducibility.

Attributes

  • desc

    Return the description from the docstring.

Examples

>>> from river import synth

>>> dataset = synth.Mv(seed=42)

>>> for x, y in dataset.take(5):
...     print(list(x.values()), y)
[1.39, -14.87, 'green', -28.35, -0.44, -31.64, 'no', 'normal', 370.67, 1178.43] -30.25
[-4.13, -12.89, 'red', -2.06, 0.01, -0.27, 'yes', 'normal', 359.95, 1108.98] 1.00
[-2.79, -12.05, 'brown', -1.39, 0.61, -4.87, 'no', 'large', 162.19, 1191.44] 15.59
[-1.63, -14.53, 'red', -7.26, 0.20, -29.33, 'no', 'normal', 314.49, 1194.62] -30.96
[-1.21, -12.23, 'brown', -6.11, 0.72, -17.66, 'no', 'large', 118.32, 1045.57] -0.60

Methods

take

Iterate over the k samples.

Parameters

  • k (int)

References