Sine generator.
This generator is an implementation of the dara stream with abrupt concept drift, as described in Gama, Joao, et al. 1.
It generates up to 4 relevant numerical features, that vary from 0 to 1, where only 2 of them are relevant to the classification task and the other 2 are optionally added by as noise. A classification function is chosen among four options:
. Abrupt concept drift, noise-free examples. It has two relevant attributes. Each attributes has values uniformly distributed in [0, 1]. In the first context all points below the curve are classified as positive. -
Reversed SINE1
. The reversed classification ofSINE1
. -
. The same two relevant attributes. The classification function is . -
Reversed SINE2
. The reversed classification ofSINE2
Concept drift can be introduced by changing the classification function. This can be done manually or using ConceptDriftStream
Two important features are the possibility to balance classes, which means the class distribution will tend to a uniform one, and the possibility to add noise, which will, add two non relevant attributes.
classification_function ('int') β defaults to
Classification functions to use. From 0 to 3.
seed ('Optional[int | np.random.RandomState]') β defaults to
If int,
is used to seed the random number generator; If RandomState instance,seed
is the random number generator; If None, the random number generator is theRandomState
instance used bynp.random
. -
balance_classes ('bool') β defaults to
Whether to balance classes or not. If balanced, the class distribution will converge to an uniform distribution.
has_noise ('bool') β defaults to
Adds 2 non relevant features to the stream.
Return the description from the docstring.
>>> from river.datasets import synth
>>> dataset = synth.Sine(classification_function = 2, seed = 112,
... balance_classes = False, has_noise = True)
>>> for x, y in dataset.take(5):
... print(x, y)
{0: 0.3750, 1: 0.6403, 2: 0.9500, 3: 0.0756} 1
{0: 0.7769, 1: 0.8327, 2: 0.0548, 3: 0.8176} 1
{0: 0.8853, 1: 0.7223, 2: 0.0025, 3: 0.9811} 0
{0: 0.3434, 1: 0.0947, 2: 0.3946, 3: 0.0049} 1
{0: 0.7367, 1: 0.9558, 2: 0.8206, 3: 0.3449} 0
Generate drift by switching the classification function at random.
Iterate over the k samples.
- k (int)
The sample generation works as follows: The two attributes are generated with the random number generator. The classification function defines whether to classify the instance as class 0 or class 1. Finally, data is balanced and noise is added, if these options are set by the user.
The generated sample will have 2 relevant features, and an additional
two noise features if has_noise
is set.
Gama, Joao, et al.'s 'Learning with drift detection.' Advances in artificial intelligenceβSBIA 2004. Springer Berlin Heidelberg, 2004. 286-295." β©