Skip to content

MvΒΆ

Mv artificial dataset.

Artificial dataset composed of both nominal and numeric features, whose features present co-dependencies. Originally described in 1.

The features are generated using the following expressions:

  • x1: uniformly distributed over [-5, 5].

  • x2: uniformly distributed over [-15, -10].

  • x3:

    • if x1>0, x3← 'green'

    • else x3← 'red' with probability 0.4 and x3← 'brown' with probability 0.6.

  • x4:

    • if x3= 'green', x4←x1+2x2

    • else x4=x12 with probability 0.3 and x4=x22 with probability 0.7.

  • x5: uniformly distributed over [-1, 1].

  • x6←x4Γ—Ο΅, where Ο΅ is uniformly distributed over [0, 5].

  • x7: 'yes' with probability 0.3, and 'no' with probability 0.7.

  • x8: 'normal' if x5<0.5 else 'large'.

  • x9: uniformly distributed over [100, 500].

  • x10: uniformly distributed integer over the interval [1000, 1200].

The target value is generated using the following rules:

  • if x2>2, y←35βˆ’0.5x4

  • else if βˆ’2≀x4≀2, y←10βˆ’2x1

  • else if x7= 'yes', y←3βˆ’x1x4

  • else if x8= 'normal', y←x6+x1

  • else y←x12.

ParametersΒΆ

  • seed (int) – defaults to None

    Random seed number used for reproducibility.

AttributesΒΆ

  • desc

    Return the description from the docstring.

ExamplesΒΆ

>>> from river.datasets import synth

>>> dataset = synth.Mv(seed=42)

>>> for x, y in dataset.take(5):
...     print(list(x.values()), y)
[1.39, -14.87, 'green', -28.35, -0.44, -31.64, 'no', 'normal', 370.67, 1178.43] -30.25
[-4.13, -12.89, 'red', -2.06, 0.01, -0.27, 'yes', 'normal', 359.95, 1108.98] 1.00
[-2.79, -12.05, 'brown', -1.39, 0.61, -4.87, 'no', 'large', 162.19, 1191.44] 15.59
[-1.63, -14.53, 'red', -7.26, 0.20, -29.33, 'no', 'normal', 314.49, 1194.62] -30.96
[-1.21, -12.23, 'brown', -6.11, 0.72, -17.66, 'no', 'large', 118.32, 1045.57] -0.60

MethodsΒΆ

take

Iterate over the k samples.

Parameters

  • k (int)

ReferencesΒΆ