Overview¶
active¶
Online active learning.
base¶
anomaly¶
Anomaly detection.
Estimators in the anomaly
module have a bespoke API. Each anomaly detector has a score_one
method instead of a predict_one
method. This method returns an anomaly score. Normal observations
should have a low score, whereas anomalous observations should have a high score. The range of the
scores is relative to each estimator.
Anomaly detectors are usually unsupervised, in that they analyze the distribution of the features they are shown. But River also has a notion of supervised anomaly detectors. These analyze the distribution of a target variable, and optionally include the distribution of the features as well. They are useful for detecting labelling anomalies, which can be detrimental if they learned by a model.
base¶
bandit¶
Multi-armed bandit (MAB) policies.
The bandit policies in River have a generic API. This allows them to be used in a variety of
situations. For instance, they can be used for model selection
(see model_selection.BanditRegressor
).
Classes
Functions
base¶
envs¶
base¶
Base interfaces.
Every estimator in River is a class, and as such inherits from at least one base interface. These are used to categorize, organize, and standardize the many estimators that River contains.
This module contains mixin classes, which are all suffixed by Mixin
. Their purpose is to
provide additional functionality to an estimator, and thus need to be used in conjunction with a
non-mixin base class.
This module also contains utilities for type hinting and tagging estimators.
- Base
- BinaryDriftAndWarningDetector
- BinaryDriftDetector
- Classifier
- Clusterer
- DriftAndWarningDetector
- DriftDetector
- Ensemble
- Estimator
- MiniBatchClassifier
- MiniBatchRegressor
- MiniBatchSupervisedTransformer
- MiniBatchTransformer
- MultiLabelClassifier
- MultiTargetRegressor
- Regressor
- SupervisedTransformer
- Transformer
- Wrapper
- WrapperEnsemble
cluster¶
Unsupervised clustering.
compat¶
Compatibility tools.
This module contains adapters for making River estimators compatible with other libraries, and
vice-versa whenever possible. The relevant adapters will only be usable if you have installed the
necessary library. For instance, you have to install scikit-learn in order to use the
compat.convert_sklearn_to_river
function.
Classes
- River2SKLClassifier
- River2SKLClusterer
- River2SKLRegressor
- River2SKLTransformer
- SKL2RiverClassifier
- SKL2RiverRegressor
Functions
compose¶
Model composition.
This module contains utilities for merging multiple modeling steps into a single pipeline. Although pipelines are not the only way to process a stream of data, we highly encourage you to use them.
Classes
- Discard
- FuncTransformer
- Grouper
- Pipeline
- Prefixer
- Renamer
- Select
- SelectType
- Suffixer
- TargetTransformRegressor
- TransformerProduct
- TransformerUnion
Functions
conf¶
Conformal predictions. This modules contains wrappers to enable conformal predictions on any regressor or classifier.
covariance¶
Online estimation of covariance and precision matrices.
datasets¶
Datasets.
This module contains a collection of datasets for multiple tasks: classification, regression, etc.
The data corresponds to popular datasets and are conveniently wrapped to easily iterate over
the data in a stream fashion. All datasets have fixed size. Please refer to river.synth
if you
are interested in infinite synthetic data generators.
Regression
Name | Samples | Features |
---|---|---|
AirlinePassengers | 144 | 1 |
Bikes | 182,470 | 8 |
ChickWeights | 578 | 3 |
MovieLens100K | 100,000 | 10 |
Restaurants | 252,108 | 7 |
Taxis | 1,458,644 | 8 |
TrumpApproval | 1,001 | 6 |
WaterFlow | 1,268 | 1 |
Binary classification
Name | Samples | Features | Sparse |
---|---|---|---|
Bananas | 5,300 | 2 | |
CreditCard | 284,807 | 30 | |
Elec2 | 45,312 | 8 | |
Higgs | 11,000,000 | 28 | |
HTTP | 567,498 | 3 | |
MaliciousURL | 2,396,130 | 3,231,961 | ✔️ |
Phishing | 1,250 | 9 | |
SMSSpam | 5,574 | 1 | |
SMTP | 95,156 | 3 | |
TREC07 | 75,419 | 5 |
Multi-class classification
Name | Samples | Features | Classes |
---|---|---|---|
ImageSegments | 2,310 | 18 | 7 |
Insects | 52,848 | 33 | 6 |
Keystroke | 20,400 | 31 | 51 |
Multi-output binary classification
Name | Samples | Features | Outputs |
---|---|---|---|
Music | 593 | 72 | 6 |
Multi-output regression
Name | Samples | Features | Outputs |
---|---|---|---|
SolarFlare | 1,066 | 10 | 3 |
base¶
synth¶
Synthetic datasets.
Each synthetic dataset is a stream generator. The benefit of using a generator is that they do not store the data and each data sample is generated on the fly. Except for a couple of methods, the majority of these methods are infinite data generators.
Binary classification
Name | Features |
---|---|
Agrawal | 9 |
AnomalySine | 2 |
ConceptDriftStream | 9 |
Hyperplane | 10 |
Mixed | 4 |
SEA | 3 |
Sine | 2 |
STAGGER | 3 |
Regression
Name | Features |
---|---|
Friedman | 10 |
FriedmanDrift | 10 |
Mv | 10 |
Planes2D | 10 |
Multi-class classification
Name | Features | Classes |
---|---|---|
LED | 7 | 10 |
LEDDrift | 7 | 10 |
RandomRBF | 10 | 2 |
RandomRBFDrift | 10 | 2 |
RandomTree | 10 | 2 |
Waveform | 21 | 3 |
Multi-output binary classification
Name | Features | Outputs |
---|---|---|
Logical | 2 | 3 |
drift¶
Concept Drift Detection.
This module contains concept drift detection methods. The purpose of a drift detector is to raise an alarm if the data distribution changes. A good drift detector method is the one that maximizes the true positives while keeping the number of false positives to a minimum.
binary¶
Drift detection for binary data.
dummy¶
Dummy estimators.
This module is here for testing purposes, as well as providing baseline performances.
ensemble¶
Ensemble learning.
Broadly speaking, there are two kinds of ensemble approaches. There are those that copy a single model several times and aggregate the predictions of said copies. This includes bagging as well as boosting. Then there are those that are composed of an arbitrary list of models, and can therefore aggregate predictions from different kinds of models.
- ADWINBaggingClassifier
- ADWINBoostingClassifier
- AdaBoostClassifier
- BOLEClassifier
- BaggingClassifier
- BaggingRegressor
- EWARegressor
- LeveragingBaggingClassifier
- SRPClassifier
- SRPRegressor
- StackingClassifier
- VotingClassifier
evaluate¶
Model evaluation.
This module provides utilities to evaluate an online model. The goal is to reproduce a real-world
scenario with high fidelity. The core function of this module is progressive_val_score
, which
allows to evaluate a model via progressive validation.
This module also exposes "tracks". A track is a predefined combination of a dataset and one or more
metrics. This allows a principled manner to compare models with each other. For instance,
the RegressionTrack
contains several datasets and metrics to evaluate regression models. There is
also a bare Track
class to implement a custom track. The benchmarks
directory at the root of
the River repository uses these tracks.
Classes
Functions
facto¶
Factorization machines.
- FFMClassifier
- FFMRegressor
- FMClassifier
- FMRegressor
- FwFMClassifier
- FwFMRegressor
- HOFMClassifier
- HOFMRegressor
feature_extraction¶
Feature extraction.
This module can be used to extract information from raw features. This includes encoding
categorical data as well as looking at interactions between existing features. This differs from
the preprocessing
module, in that the latter's purpose is rather to clean the data so that it may
be processed by a particular machine learning algorithm.
feature_selection¶
Feature selection.
forest¶
This module implements forest-based classifiers and regressors.
imblearn¶
Sampling methods.
- ChebyshevOverSampler
- ChebyshevUnderSampler
- HardSamplingClassifier
- HardSamplingRegressor
- RandomOverSampler
- RandomSampler
- RandomUnderSampler
linear_model¶
Linear models.
- ALMAClassifier
- BayesianLinearRegression
- LinearRegression
- LogisticRegression
- PAClassifier
- PARegressor
- Perceptron
- SoftmaxRegression
base¶
metrics¶
Evaluation metrics.
All the metrics are updated one sample at a time. This way we can track performance of predictive methods over time.
Note that all metrics have a revert
method, enabling them to be wrapped in utils.Rolling
.
This allows computirng rolling metrics:
from river import metrics, utils
y_true = [True, False, True, True]
y_pred = [False, False, True, True]
metric = utils.Rolling(metrics.Accuracy(), window_size=3)
for yt, yp in zip(y_true, y_pred):
print(metric.update(yt, yp))
Accuracy: 0.00%
Accuracy: 50.00%
Accuracy: 66.67%
Accuracy: 100.00%
- Accuracy
- AdjustedMutualInfo
- AdjustedRand
- BalancedAccuracy
- ClassificationReport
- CohenKappa
- Completeness
- ConfusionMatrix
- CrossEntropy
- F1
- FBeta
- FowlkesMallows
- GeometricMean
- Homogeneity
- Jaccard
- LogLoss
- MAE
- MAPE
- MCC
- MSE
- MacroF1
- MacroFBeta
- MacroJaccard
- MacroPrecision
- MacroRecall
- MicroF1
- MicroFBeta
- MicroJaccard
- MicroPrecision
- MicroRecall
- MultiFBeta
- MutualInfo
- NormalizedMutualInfo
- Precision
- R2
- RMSE
- RMSLE
- ROCAUC
- Rand
- Recall
- RollingROCAUC
- SMAPE
- Silhouette
- VBeta
- WeightedF1
- WeightedFBeta
- WeightedJaccard
- WeightedPrecision
- WeightedRecall
base¶
multioutput¶
Metrics for multi-output learning.
base¶
misc¶
Miscellaneous.
This module essentially regroups some implementations that have nowhere else to go.
model_selection¶
Model selection.
This module regroups a variety of methods that may be used for performing model selection. An model selector is provided with a list of models. These are called "experts" in the expert learning literature. The model selector's goal is to perform at least as well as the best model. Indeed, initially, the best model is not known. The performance of each model becomes more apparent as time goes by. Different strategies are possible, each one offering a different tradeoff in terms of accuracy and computational performance.
Model selection can be used for tuning the hyperparameters of a model. This may be done by creating
a copy of the model for each set of hyperparameters, and treating each copy as a separate model.
The utils.expand_param_grid
function can be used for this purpose.
- BanditClassifier
- BanditRegressor
- GreedyRegressor
- SuccessiveHalvingClassifier
- SuccessiveHalvingRegressor
base¶
multiclass¶
Multi-class classification.
multioutput¶
Multi-output models.
- ClassifierChain
- MonteCarloClassifierChain
- MultiClassEncoder
- ProbabilisticClassifierChain
- RegressorChain
naive_bayes¶
Naive Bayes algorithms.
neighbors¶
Neighbors-based learning.
Also known as lazy methods. In these methods, generalisation of the training data is delayed until a query is received.
neural_net¶
Neural networks.
activations¶
optim¶
Stochastic optimization.
- AMSGrad
- AdaBound
- AdaDelta
- AdaGrad
- AdaMax
- Adam
- Averager
- FTRLProximal
- Momentum
- Nadam
- NesterovMomentum
- RMSProp
- SGD
base¶
initializers¶
Weight initializers.
losses¶
Loss functions.
Each loss function is intended to work with both single values as well as numpy vectors.
- Absolute
- BinaryFocalLoss
- BinaryLoss
- Cauchy
- CrossEntropy
- EpsilonInsensitiveHinge
- Hinge
- Huber
- Log
- MultiClassLoss
- Poisson
- Quantile
- RegressionLoss
- Squared
schedulers¶
Learning rate schedulers.
preprocessing¶
Feature preprocessing.
The purpose of this module is to modify an existing set of features so that they can be processed
by a machine learning algorithm. This may be done by scaling numeric parts of the data or by
one-hot encoding categorical features. The difference with the feature_extraction
module is that
the latter extracts new information from the data
- AdaptiveStandardScaler
- Binarizer
- FeatureHasher
- GaussianRandomProjector
- LDA
- MaxAbsScaler
- MinMaxScaler
- Normalizer
- OneHotEncoder
- PredClipper
- PreviousImputer
- RobustScaler
- SparseRandomProjector
- StandardScaler
- StatImputer
- TargetMinMaxScaler
- TargetStandardScaler
proba¶
Probability distributions.
base¶
reco¶
Recommender systems module.
Recommender systems (recsys for short) is a large topic. This module is far from comprehensive. It simply provides models which can contribute towards building a recommender system.
A typical recommender system is made up of a retrieval phase, followed by a ranking phase. The output of the retrieval phase is a shortlist of the catalogue of items. The items in the shortlist are then usually ranked according to the expected preference the user will have for each item. This module focuses on the ranking phase.
Models which inherit from the Ranker
class have a rank
method. This allows sorting a set of
items for a given user. Each model also has a learn_one(user, item, y, context)
which allows
learning user preferences. The y
parameter is a reward value, the nature of which depends is
specific to each and every recommendation task. Typically the reward is a number or a boolean
value. It is up to the user to determine how to translate a user session into training data.
base¶
rules¶
Decision rules-based algorithms.
sketch¶
Data containers and collections for sequential data.
This module has summary and sketch structures that operate with constrained amounts of memory and processing time.
stats¶
Running statistics
- AbsMax
- AutoCorr
- BayesianMean
- Count
- Cov
- EWMean
- EWVar
- Entropy
- IQR
- Kurtosis
- Link
- MAD
- Max
- Mean
- Min
- Mode
- NUnique
- PeakToPeak
- PearsonCorr
- Quantile
- RollingAbsMax
- RollingIQR
- RollingMax
- RollingMin
- RollingMode
- RollingPeakToPeak
- RollingQuantile
- SEM
- Shift
- Skew
- Sum
- Var
base¶
stream¶
Streaming utilities.
The module includes tools to iterate over data streams.
Classes
Functions
- iter_arff
- iter_array
- iter_csv
- iter_libsvm
- iter_pandas
- iter_sklearn_dataset
- iter_sql
- iter_vaex
- shuffle
- simulate_qa
time_series¶
Time series forecasting.
Classes
Functions