Skip to content

OverviewΒΆ

activeΒΆ

Online active learning.

baseΒΆ

anomalyΒΆ

Anomaly detection.

Estimators in the anomaly module have a bespoke API. Each anomaly detector has a score_one method instead of a predict_one method. This method returns an anomaly score. Normal observations should have a low score, whereas anomalous observations should have a high score. The range of the scores is relative to each estimator.

Anomaly detectors are usually unsupervised, in that they analyze the distribution of the features they are shown. But River also has a notion of supervised anomaly detectors. These analyze the distribution of a target variable, and optionally include the distribution of the features as well. They are useful for detecting labelling anomalies, which can be detrimental if they learned by a model.

baseΒΆ

banditΒΆ

Multi-armed bandit (MAB) policies.

The bandit policies in River are meant to have a generic API. This allows them to be used in a variety of contexts. Within River, they are used for model selection (see model_selection.BanditRegressor).

Classes

Functions

baseΒΆ

envsΒΆ

baseΒΆ

Base interfaces.

Every estimator in river is a class, and as such inherits from at least one base interface. These are used to categorize, organize, and standardize the many estimators that river contains.

This module contains mixin classes, which are all suffixed by Mixin. Their purpose is to provide additional functionality to an estimator, and thus need to be used in conjunction with a non-mixin base class.

This module also contains utilities for type hinting and tagging estimators.

clusterΒΆ

Unsupervised clustering.

compatΒΆ

Compatibility tools.

This module contains adapters for making river estimators compatible with other libraries, and vice-versa whenever possible. The relevant adapters will only be usable if you have installed the necessary library. For instance, you have to install scikit-learn in order to use the compat.convert_sklearn_to_river function.

Classes

Functions

composeΒΆ

Model composition.

This module contains utilities for merging multiple modeling steps into a single pipeline. Although pipelines are not the only way to process a stream of data, we highly encourage you to use them.

Classes

Functions

confΒΆ

Conformal predictions. This modules contains wrappers to enable conformal predictions on any regressor or classifier.

covarianceΒΆ

Online estimation of covariance and precision matrices.

datasetsΒΆ

Datasets.

This module contains a collection of datasets for multiple tasks: classification, regression, etc. The data corresponds to popular datasets and are conveniently wrapped to easily iterate over the data in a stream fashion. All datasets have fixed size. Please refer to river.synth if you are interested in infinite synthetic data generators.

baseΒΆ

synthΒΆ

Synthetic datasets.

Each synthetic dataset is a stream generator. The benefit of using a generator is that they do not store the data and each data sample is generated on the fly. Except for a couple of methods, the majority of these methods are infinite data generators.

driftΒΆ

Concept Drift Detection.

This module contains concept drift detection methods. The purpose of a drift detector is to raise an alarm if the data distribution changes. A good drift detector method is the one that maximizes the true positives while keeping the number of false positives to a minimum.

binaryΒΆ

Drift detection for binary data.

dummyΒΆ

Dummy estimators.

This module is here for testing purposes, as well as providing baseline performances.

ensembleΒΆ

Ensemble learning.

Broadly speaking, there are two kinds of ensemble approaches. There are those that copy a single model several times and aggregate the predictions of said copies. This includes bagging as well as boosting. Then there are those that are composed of an arbitrary list of models, and can therefore aggregate predictions from different kinds of models.

evaluateΒΆ

Model evaluation.

This module provides utilities to evaluate an online model. The goal is to reproduce a real-world scenario with high fidelity. The core function of this module is progressive_val_score, which allows to evaluate a model via progressive validation.

This module also exposes "tracks". A track is a predefined combination of a dataset and one or more metrics. This allows a principled manner to compare models with each other. For instance, the RegressionTrack contains several datasets and metrics to evaluate regression models. There is also a bare Track class to implement a custom track. The benchmarks directory at the root of the River repository uses these tracks.

Classes

Functions

factoΒΆ

Factorization machines.

feature_extractionΒΆ

Feature extraction.

This module can be used to extract information from raw features. This includes encoding categorical data as well as looking at interactions between existing features. This differs from the processing module in that the latter's purpose is rather to clean the data so that it may be processed by a particular machine learning algorithm.

feature_selectionΒΆ

Feature selection.

forestΒΆ

This module implements forest-based classifiers and regressors.

imblearnΒΆ

Sampling methods.

linear_modelΒΆ

Linear models.

baseΒΆ

metricsΒΆ

Evaluation metrics.

All the metrics are updated one sample at a time. This way we can track performance of predictive methods over time.

Note that all metrics have a revert method, enabling them to be wrapped in utils.Rolling. This allows computirng rolling metrics:

from river import metrics, utils

y_true = [True, False, True, True] y_pred = [False, False, True, True]

metric = utils.Rolling(metrics.Accuracy(), window_size=3)

for yt, yp in zip(y_true, y_pred): ... print(metric.update(yt, yp)) Accuracy: 0.00% Accuracy: 50.00% Accuracy: 66.67% Accuracy: 100.00%

baseΒΆ

multioutputΒΆ

Metrics for multi-output learning.

baseΒΆ

miscΒΆ

Miscellaneous.

This module essentially regroups some implementations that have nowhere else to go.

model_selectionΒΆ

Model selection.

This module regroups a variety of methods that may be used for performing model selection. An model selector is provided with a list of models. These are called "experts" in the expert learning literature. The model selector's goal is to perform at least as well as the best model. Indeed, initially, the best model is not known. The performance of each model becomes more apparent as time goes by. Different strategies are possible, each one offering a different tradeoff in terms of accuracy and computational performance.

Model selection can be used for tuning the hyperparameters of a model. This may be done by creating a copy of the model for each set of hyperparameters, and treating each copy as a separate model. The utils.expand_param_grid function can be used for this purpose.

baseΒΆ

multiclassΒΆ

Multi-class classification.

multioutputΒΆ

Multi-output models.

naive_bayesΒΆ

Naive Bayes algorithms.

neighborsΒΆ

Neighbors-based learning.

Also known as lazy methods. In these methods, generalisation of the training data is delayed until a query is received.

neural_netΒΆ

Neural networks.

activationsΒΆ

optimΒΆ

Stochastic optimization.

baseΒΆ

initializersΒΆ

Weight initializers.

lossesΒΆ

Loss functions.

Each loss function is intended to work with both single values as well as numpy vectors.

schedulersΒΆ

Learning rate schedulers.

preprocessingΒΆ

Feature preprocessing.

The purpose of this module is to modify an existing set of features so that they can be processed by a machine learning algorithm. This may be done by scaling numeric parts of the data or by one-hot encoding categorical features. The difference with the feature_extraction module is that the latter extracts new information from the data

probaΒΆ

Probability distributions.

baseΒΆ

recoΒΆ

Recommender systems module.

Recommender systems (recsys for short) is a large topic. This module is far from comprehensive. It simply provides models which can contribute towards building a recommender system.

A typical recommender system is made up of a retrieval phase, followed by a ranking phase. The output of the retrieval phase is a shortlist of the catalogue of items. The items in the shortlist are then usually ranked according to the expected preference the user will have for each item. This module focuses on the ranking phase.

Models which inherit from the Ranker class have a rank method. This allows sorting a set of items for a given user. Each model also has a learn_one(user, item, y, context) which allows learning user preferences. The y parameter is a reward value, the nature of which depends is specific to each and every recommendation task. Typically the reward is a number or a boolean value. It is up to the user to determine how to translate a user session into training data.

baseΒΆ

rulesΒΆ

Decision rules-based algorithms.

sketchΒΆ

Data containers and collections for sequential data.

This module has summary and sketch structures that operate with constrained amounts of memory and processing time.

statsΒΆ

Running statistics

baseΒΆ

streamΒΆ

Streaming utilities.

The module includes tools to iterate over data streams.

Classes

Functions

time_seriesΒΆ

Time series forecasting.

Classes

Functions

baseΒΆ

treeΒΆ

This module implements incremental Decision Tree (iDT) algorithms for handling classification and regression tasks.

Each family of iDT will be presented in a dedicated section.

At any moment, iDT might face situations where an input feature previously used to make a split decision is missing in an incoming sample. In this case, the most traversed path is selected to pass down the instance. Moreover, in the case of nominal features, if a new category arises and the feature is used in a decision node, a new branch is created to accommodate the new value.

1. Hoeffding Trees

This family of iDT algorithms use the Hoeffding Bound to determine whether or not the incrementally computed best split candidates would be equivalent to the ones obtained in a batch-processing fashion.

All the available Hoeffding Tree (HT) implementation share some common functionalities:

  • Set the maximum tree depth allowed (max_depth).

  • Handle Active and Inactive nodes: Active learning nodes update their own internal state to improve predictions and monitor input features to perform split attempts. Inactive learning nodes do not update their internal state and only keep the predictors; they are used to save memory in the tree (max_size).

  • Enable/disable memory management.

  • Define strategies to sort leaves according to how likely they are going to be split. This enables deactivating non-promising leaves to save memory.

  • Disabling β€˜poor’ attributes to save memory and speed up tree construction. A poor attribute is an input feature whose split merit is much smaller than the current best candidate. Once a feature is disabled, the tree stops saving statistics necessary to split such a feature.

  • Define properties to access leaf prediction strategies, split criteria, and other relevant characteristics.

2. Stochastic Gradient Trees

Stochastic Gradient Trees (SGT) directly optimize a loss function, rather than relying on split heuristics to guide the tree growth. F-tests are performed do decide whether a leaf should be expanded or its prediction value should be updated.

SGTs can deal with binary classification and single-target regression. They also support dynamic and static feature quantizers to deal with numerical inputs.

baseΒΆ

This module defines generic branch and leaf implementations. These should be used in River by each tree-based model. Using these classes makes the code more DRY. The only exception for not doing so would be for performance, whereby a tree-based model uses a bespoke implementation.

This module defines a bunch of methods to ease the manipulation and diagnostic of trees. Its intention is to provide utilities for walking over a tree and visualizing it.

splitterΒΆ

This module implements the Attribute Observers (AO) (or tree splitters) that are used by the Hoeffding Trees (HT). It also implements the feature quantizers (FQ) used by Stochastic Gradient Trees (SGT). AOs are a core aspect of the HTs construction, and might represent one of the major bottlenecks when building the trees. The same holds for SGTs and FQs. The correct choice and setup of a splitter might result in significant differences in the running time and memory usage of the incremental decision trees.

AOs for classification and regression trees can be differentiated by using the property is_target_class (True for splitters designed to classification tasks). An error will be raised if one tries to use a classification splitter in a regression tree and vice-versa. Lastly, AOs cannot be used in SGT and FQs cannot be used in Hoeffding Trees. So, care must be taken when choosing the correct feature splitter.

utilsΒΆ

Shared utility classes and functions

Classes

Functions

mathΒΆ

Mathematical utility functions (intended for internal purposes).

A lot of this is experimental and has a high probability of changing in the future.

normΒΆ

prettyΒΆ

Helper functions for making things readable by humans.

randomΒΆ