0.24.0 - 2026-04-14¶
sketch¶
- Add the
sketch.NUniqueclass. It was previoulsy in thestatsmodule. This sketch estimates the number of unique elements in a stream.
build¶
- Added Python 3.14 wheel builds and updated PyO3 for 3.14 support.
- Replaced poetry with uv for dependency management.
- Vendored the
watermillRust crate intorust_src/so all Rust statistics code lives directly in therivercrate. Removed the externalwatermilldependency. The crate will be published asriverin the future. - Made PyO3 an optional dependency (
pyo3-bindingsfeature) so the Rust library can be benchmarked and tested independently of Python.
datasets¶
- Fixed download in Insects dataset. The datasets incremental_abrupt_imbalanced, incremental_imbalanced, incremental_reoccurring_imbalanced and out-of-control are not supported anymore.
- Refactored
benchmarksand added plotly dependency for interactive plots - Added the BETH dataset for labeled system process events.
- Fixed
SMTPdataset docstring: corrected the number of positive labels from 2,211 to 30 and updated the reference link.
cluster¶
- Fixed DBSTREAM including noisy micro-clusters (weight below
minimum_weight) in output clusters. They are now excluded during reclustering, matching the original paper.
forest¶
- Added
max_nodesparameter toAMFClassifier,AMFRegressor, and the underlying Mondrian tree classes. This caps the number of nodes per tree, limiting memory usage for long-running streams. Addresses #1454.
drift¶
- Optimized
ADWINCython internals (~18x speedup): replaced numpy arrays with Cmalloc/memmovearrays inBucket, replaced Pythondequewith typedlist, used bit shifts instead ofpow, inlinedvariance_in_window, and added Cython compiler directives.
dummy¶
The dummy module is now fully type-annotated.
stats¶
- Optimized Rust
EWMean/EWVariance: precompute1-alpha, replacepowf(2)with multiplication (~25% faster). - Optimized Rust
Quantile(P² algorithm): removed unnecessary sort on every update, switched to stack-allocated arrays (~40% faster). - Optimized Rust
CentralMoments: replacedpowf(2)with multiplication inupdate_m4. - Added
#[inline(always)]to all hot Rust stat methods for cross-crate inlining. - Added Criterion benchmarks for all Rust statistics (
benches/stats_bench.rs). - Added
update_manymethod tostats.PearsonCorr. - Moved
stats.NUniqueto thesketchmodule, as it is more of a sketch than a statistical indicator. - Changed the calculation of the Kuiper statistic in
base.KolmogorovSmirnovto correspond to the reference implementation. The Kuiper statistic uses the difference between the maximum value and the minimum value. - Fixed
RollingQuantilenot storingqas an instance attribute, which causedclone()to fail. - Optimized
Var.update/revertandCov.update/revertby replacingMean.get()method calls with direct_meanattribute access and inlining property lookups (~19% speedup each). - Optimized
KolmogorovSmirnovtreap internals: replaced class-basedTreapwith__slots__nodes and module-level functions, inlined lazy propagation, and eliminated builtinmax/minoverhead. This yields a 2.65x speedup on update/revert operations.
compat¶
- Adapted sklearn compatibility layer to sklearn 1.8: replaced
_more_tagswith__sklearn_tags__, switched fromcheck_X_y/check_arraytovalidate_data, fixed mixin inheritance order, and updated binary classifier validation.
metrics¶
- Fixed
AdjustedMutualInfoto return 0.0 when only one class or one cluster exists, and to handle the 0/0 edge case for perfect matches with small samples, aligning with sklearn 1.8 behavior. - Fixed
KeyErrorinSilhouettemetric when used with clusterers that haven't initialized their centers yet (e.g.,CluStreamduring its warmup phase). - Optimized
ConfusionMatrixby inlining_updateintoupdate/revert(~10% speedup) and cachingtotal_true_positivesas an incrementally maintained counter (99% speedup on access). - Cached
requires_labelsinBinaryMetric.__init__to avoid property lookup on everyupdate/revertcall. - Added
metrics.RollingPRAUC, which computes the area under the precision-recall curve over a rolling window of predictions and true labels.
evaluate¶
- Optimized
progressive_val_scoreanditer_progressive_val_scorewith a fast path for the common no-delay case. The evaluation loop now iterates the dataset directly, skipping thesimulate_qagenerator and internal prediction buffer. Combined with cachingmodel._supervisedandmetric.update, this yields a 1.5x speedup on typical workloads. - Added per-sample weight support in
progressive_val_scoreanditer_progressive_val_score. Weights can be passed via dataset tuples as(x, y, {"w": 2.0})and are forwarded tolearn_onefor models that accept awparameter.
stream¶
stream.iter_arffnow supports blank values (treated as missing values).
preprocessing¶
- Add support for expected categories in
preprocessing.OneHotEncoder,preprocessing.OrdinalEncoder, akin to scikit-learn API for respective encoders. - Added a fast path in
simulate_qafor the no-delay, no-moment case, skipping the memento queue machinery. - Fixed a bug that caused
preprocessing.OrdinalEncoderto not be picklable.
base¶
- Added
EstimatorMetametaclass so thatisinstanceworks transparently with pipelines. For example,isinstance(scaler | log_reg, [base.Classifier](../api/base/Classifier))now returnsTrue. This removes the need forutils.inspecthelper functions (isclassifier,isregressor, etc.), which have been removed.
proba¶
- Optimized
Gaussian.__call__by inlining property accesses, and addedGaussian.log_pdfmethod that computes log-density directly withoutexp/sqrt. This speeds up Naive Bayes prediction in all Hoeffding Tree classifiers and their ensembles by 12–22%.
tree¶
- Fixed Mondrian Tree branch nodes losing bounding box ranges during splits. When a branch was split, the new child branch did not inherit
memory_range_min/memory_range_max, causing incorrect range extension calculations. This affected bothMondrianTreeClassifierandMondrianTreeRegressor, as well as their forest variants (AMFClassifier,AMFRegressor). Fixes #1801. - Changed the default
stepparameter forMondrianTreeClassifierandMondrianTreeRegressorfrom0.1to1.0, matching the onelearn reference implementation and the existing default inAMFClassifier/AMFRegressor. - Added
cond_log_probatoGaussianSplitterand optimizeddo_naive_bayes_predictionto use direct log-probabilities, avoiding theexp/loground-trip. - Added handling for division by zero in
tree.hoeffding_treefor leaf size estimation. - Optimized Mondrian trees and AMF (Aggregated Mondrian Forest): replaced dict-based node storage with list-indexed storage, added feature/class-to-index mappings, inlined all hot-path methods, and moved the core traversal loops (
_go_downwards,update_downwards,predict_probaupward walk) to Cython. AMFClassifier is 3.2x faster, AMFRegressor is 2.9x faster. - Fixed a shared-state bug in
MondrianTreeRegressorwherereplant()copied theMeanobject by reference, causing branch and child nodes to share the same running mean. Each node now maintains independent mean statistics, matching the onelearn reference implementation.
utils¶
- Optimized
VectorDictbinary operations (add, sub, mul, div, minimum, maximum) with fast paths that bypass generator-based key iteration when no mask or factory is set. Added fusedisub_scaled/iadd_scaledmethods to avoid intermediate allocations. This speeds up online linear models by 8–34% depending on the optimizer and feature count.
compose¶
- Fixed
Pipeline.learn_oneto forward extra**params(e.g. sample weightw) to the final supervised step.
linear_model¶
- Optimized
GLMbase class: replacedcontextlib.contextmanagerwith direct try/finally inlearn_one/learn_many, and build gradient dicts in a single pass. Combined with the VectorDict improvements, LinearRegression with Adam is up to 1.34x faster.
neighbors¶
- Added function in nearest-neighbor engines to gather relevant classes/targets from the window.
- Added a virtual function to the base engine class; New NN engines need to override
refresh_targetsfunction - Classifier KNN now calls this engine-specific function under
clean_up_classes()
build¶
- Added Python 3.14 wheel builds and updated PyO3 for 3.14 support.
- Replaced poetry with uv for dependency management.
evaluate¶
- Moved forecasting evaluation utilities from
time_series.evaluatetoevaluate(evaluate.evaluateandevaluate.iter_evaluate) and deprecatedtime_series.evaluate/time_series.iter_evaluate. - Added
evaluate.ForecastingTrackto benchmark and compare time series forecasting models.
utils¶
- The
utilsmodule is now fully type-checked. utils.VectorDictandutils.SortedWindoware now parametrised generic containers.utils.VectorDictnow implements the reflected operations of addition, subtraction and multiplication.- Optimized KNN distance computation with Cython-accelerated Euclidean distance (
euclidean_distance_dictandeuclidean_distance_tupleinVectorDict), specialized fast paths for p=1 and p=2 inminkowski_distance, a fully Cython-accelerated search loop inLazySearch(lazy_search_euclidean),heapq.nsmallestfallback for custom distances, and reduced Python overhead in SWINN's_refine/_searchvia local variable caching and inlined neighbor checks. Overall ~5x speedup for LazySearch and ~1.3x for SWINN. - Optimized
RollingandTimeRollingby replacing__getattribute__proxy with__getattr__, cachingwindow_size, and reducing attribute lookups in the hot path (~3x speedup on per-update latency).