Skip to content

0.24.0 - 2026-04-14

sketch

  • Add the sketch.NUnique class. It was previoulsy in the stats module. This sketch estimates the number of unique elements in a stream.

build

  • Added Python 3.14 wheel builds and updated PyO3 for 3.14 support.
  • Replaced poetry with uv for dependency management.
  • Vendored the watermill Rust crate into rust_src/ so all Rust statistics code lives directly in the river crate. Removed the external watermill dependency. The crate will be published as river in the future.
  • Made PyO3 an optional dependency (pyo3-bindings feature) so the Rust library can be benchmarked and tested independently of Python.

datasets

  • Fixed download in Insects dataset. The datasets incremental_abrupt_imbalanced, incremental_imbalanced, incremental_reoccurring_imbalanced and out-of-control are not supported anymore.
  • Refactored benchmarks and added plotly dependency for interactive plots
  • Added the BETH dataset for labeled system process events.
  • Fixed SMTP dataset docstring: corrected the number of positive labels from 2,211 to 30 and updated the reference link.

cluster

  • Fixed DBSTREAM including noisy micro-clusters (weight below minimum_weight) in output clusters. They are now excluded during reclustering, matching the original paper.

forest

  • Added max_nodes parameter to AMFClassifier, AMFRegressor, and the underlying Mondrian tree classes. This caps the number of nodes per tree, limiting memory usage for long-running streams. Addresses #1454.

drift

  • Optimized ADWIN Cython internals (~18x speedup): replaced numpy arrays with C malloc/memmove arrays in Bucket, replaced Python deque with typed list, used bit shifts instead of pow, inlined variance_in_window, and added Cython compiler directives.

dummy

The dummy module is now fully type-annotated.

stats

  • Optimized Rust EWMean/EWVariance: precompute 1-alpha, replace powf(2) with multiplication (~25% faster).
  • Optimized Rust Quantile (P² algorithm): removed unnecessary sort on every update, switched to stack-allocated arrays (~40% faster).
  • Optimized Rust CentralMoments: replaced powf(2) with multiplication in update_m4.
  • Added #[inline(always)] to all hot Rust stat methods for cross-crate inlining.
  • Added Criterion benchmarks for all Rust statistics (benches/stats_bench.rs).
  • Added update_many method to stats.PearsonCorr.
  • Moved stats.NUnique to the sketch module, as it is more of a sketch than a statistical indicator.
  • Changed the calculation of the Kuiper statistic in base.KolmogorovSmirnov to correspond to the reference implementation. The Kuiper statistic uses the difference between the maximum value and the minimum value.
  • Fixed RollingQuantile not storing q as an instance attribute, which caused clone() to fail.
  • Optimized Var.update/revert and Cov.update/revert by replacing Mean.get() method calls with direct _mean attribute access and inlining property lookups (~19% speedup each).
  • Optimized KolmogorovSmirnov treap internals: replaced class-based Treap with __slots__ nodes and module-level functions, inlined lazy propagation, and eliminated builtin max/min overhead. This yields a 2.65x speedup on update/revert operations.

compat

  • Adapted sklearn compatibility layer to sklearn 1.8: replaced _more_tags with __sklearn_tags__, switched from check_X_y/check_array to validate_data, fixed mixin inheritance order, and updated binary classifier validation.

metrics

  • Fixed AdjustedMutualInfo to return 0.0 when only one class or one cluster exists, and to handle the 0/0 edge case for perfect matches with small samples, aligning with sklearn 1.8 behavior.
  • Fixed KeyError in Silhouette metric when used with clusterers that haven't initialized their centers yet (e.g., CluStream during its warmup phase).
  • Optimized ConfusionMatrix by inlining _update into update/revert (~10% speedup) and caching total_true_positives as an incrementally maintained counter (99% speedup on access).
  • Cached requires_labels in BinaryMetric.__init__ to avoid property lookup on every update/revert call.
  • Added metrics.RollingPRAUC, which computes the area under the precision-recall curve over a rolling window of predictions and true labels.

evaluate

  • Optimized progressive_val_score and iter_progressive_val_score with a fast path for the common no-delay case. The evaluation loop now iterates the dataset directly, skipping the simulate_qa generator and internal prediction buffer. Combined with caching model._supervised and metric.update, this yields a 1.5x speedup on typical workloads.
  • Added per-sample weight support in progressive_val_score and iter_progressive_val_score. Weights can be passed via dataset tuples as (x, y, {"w": 2.0}) and are forwarded to learn_one for models that accept a w parameter.

stream

preprocessing

base

  • Added EstimatorMeta metaclass so that isinstance works transparently with pipelines. For example, isinstance(scaler | log_reg, [base.Classifier](../api/base/Classifier)) now returns True. This removes the need for utils.inspect helper functions (isclassifier, isregressor, etc.), which have been removed.

proba

  • Optimized Gaussian.__call__ by inlining property accesses, and added Gaussian.log_pdf method that computes log-density directly without exp/sqrt. This speeds up Naive Bayes prediction in all Hoeffding Tree classifiers and their ensembles by 12–22%.

tree

  • Fixed Mondrian Tree branch nodes losing bounding box ranges during splits. When a branch was split, the new child branch did not inherit memory_range_min/memory_range_max, causing incorrect range extension calculations. This affected both MondrianTreeClassifier and MondrianTreeRegressor, as well as their forest variants (AMFClassifier, AMFRegressor). Fixes #1801.
  • Changed the default step parameter for MondrianTreeClassifier and MondrianTreeRegressor from 0.1 to 1.0, matching the onelearn reference implementation and the existing default in AMFClassifier/AMFRegressor.
  • Added cond_log_proba to GaussianSplitter and optimized do_naive_bayes_prediction to use direct log-probabilities, avoiding the exp/log round-trip.
  • Added handling for division by zero in tree.hoeffding_tree for leaf size estimation.
  • Optimized Mondrian trees and AMF (Aggregated Mondrian Forest): replaced dict-based node storage with list-indexed storage, added feature/class-to-index mappings, inlined all hot-path methods, and moved the core traversal loops (_go_downwards, update_downwards, predict_proba upward walk) to Cython. AMFClassifier is 3.2x faster, AMFRegressor is 2.9x faster.
  • Fixed a shared-state bug in MondrianTreeRegressor where replant() copied the Mean object by reference, causing branch and child nodes to share the same running mean. Each node now maintains independent mean statistics, matching the onelearn reference implementation.

utils

  • Optimized VectorDict binary operations (add, sub, mul, div, minimum, maximum) with fast paths that bypass generator-based key iteration when no mask or factory is set. Added fused isub_scaled/iadd_scaled methods to avoid intermediate allocations. This speeds up online linear models by 8–34% depending on the optimizer and feature count.

compose

  • Fixed Pipeline.learn_one to forward extra **params (e.g. sample weight w) to the final supervised step.

linear_model

  • Optimized GLM base class: replaced contextlib.contextmanager with direct try/finally in learn_one/learn_many, and build gradient dicts in a single pass. Combined with the VectorDict improvements, LinearRegression with Adam is up to 1.34x faster.

neighbors

  • Added function in nearest-neighbor engines to gather relevant classes/targets from the window.
  • Added a virtual function to the base engine class; New NN engines need to override refresh_targets function
  • Classifier KNN now calls this engine-specific function under clean_up_classes()

build

  • Added Python 3.14 wheel builds and updated PyO3 for 3.14 support.
  • Replaced poetry with uv for dependency management.

evaluate

utils

  • The utils module is now fully type-checked.
  • utils.VectorDict and utils.SortedWindow are now parametrised generic containers.
  • utils.VectorDict now implements the reflected operations of addition, subtraction and multiplication.
  • Optimized KNN distance computation with Cython-accelerated Euclidean distance (euclidean_distance_dict and euclidean_distance_tuple in VectorDict), specialized fast paths for p=1 and p=2 in minkowski_distance, a fully Cython-accelerated search loop in LazySearch (lazy_search_euclidean), heapq.nsmallest fallback for custom distances, and reduced Python overhead in SWINN's _refine/_search via local variable caching and inlined neighbor checks. Overall ~5x speedup for LazySearch and ~1.3x for SWINN.
  • Optimized Rolling and TimeRolling by replacing __getattribute__ proxy with __getattr__, caching window_size, and reducing attribute lookups in the hot path (~3x speedup on per-update latency).