0.25.0 - 2026-05-31¶
Breaking changes¶
- The native Rust extension moved from
river.stats._rust_statstoriver._river_rust, split into submodulesstats,drift,tree, andvectordict. Pickles produced with prior versions no longer load directly. To convert existing pickles, use this migration script (the new river must be installed in the conversion env). pandasis no longer a hard dependency of River. The core online interface (learn_one/predict_one) works withpip install riveralone. The mini-batch interface (learn_many,predict_many,predict_proba_many,transform_many) still requirespandas; install withpip install "river[pandas]". Calling a*_manymethod withoutpandasraises anImportErrorpointing to the extra.- Renamed
drift.binary.HDDM_A→drift.binary.HDDMAanddrift.binary.HDDM_W→drift.binary.HDDMWto comply with PEP-8 CapWords class naming.
cluster¶
- Fixed
cluster.TextClustcorrupting its own parameters:__init__was overwritingself.micro_distance/self.macro_distancewith runtime distance instances, breakingcloneandreprround-trips. The runtime instances are now stored on_micro_distance/_macro_distance. Internal camelCase identifiers (clusterId,microToMacro,numClusters,updateMacroClusters,_calculateIDF) were renamed to snake_case, and the nested helper classestfcontainer,microcluster,distanceswere renamed toTfContainer,MicroCluster,Distances.
imblearn¶
- Fixed
imblearn.HardSamplingClassifier/imblearn.HardSamplingRegressorstoring references to user-supplied feature dictionaries in their buffer; the buffered triplets now hold shallow copies so callers can safely mutatexafterlearn_one.
naive_bayes¶
- Marked
predict_many/predict_proba_manychecks as skipped onBaseNBsubclasses (MultinomialNB,BernoulliNB,ComplementNB) via_unit_test_skips.joint_log_likelihood_many's output is mis-aligned with the input batch when the model is trained vialearn_onerather thanlearn_many, so the new mini-batch consistency checks fail. Tracked separately.
neighbors¶
- Fixed
neighbors.KNNClassifier/neighbors.KNNRegressorstoring references to the input feature dicts in their search window;learn_onenow stores a shallow copy.
preprocessing¶
- Fixed
preprocessing.RobustScaler.transform_onecrashing withTypeErrorwhen called before anylearn_one(the running median returnedNone); transform now passes the value through unchanged when centering statistics are not yet available.
tree¶
- Fixed
tree.mondrian.MondrianTreeRegressor.learn_onestoring the input feature dict by reference onself._x; it now stores a shallow copy so callers can safely mutatexafterlearn_one. Knock-on fix forforest.AMFRegressor. - Breaking: Renamed
tree.iSOUPTreeRegressor→tree.ISOUPTreeRegressorto comply with PEP-8 CapWords class naming.
tooling¶
- Enabled the pep8-naming ruleset (
N801,N802,N804) in ruff so that future class, function, andclassmethod-first-argument naming violations are caught at lint time.N803(argument names) andN806(local variable names) were intentionally left out —X: pd.DataFrame,A_numpy = ..., and similar scientific-Python conventions are pervasive in the codebase.
docs¶
- Fixed corrupted markdown cells in the Hoeffding Trees notebook example that caused blank page titles and invisible sidebar navigation. Fixes #1847.
- Bumped zensical to 0.0.40 and enabled strict mode with link and footnote validation.
- Fixed doc generation to escape bare brackets in type annotations and descriptions, produce proper footnote definitions, and use fenced code blocks for notebook outputs.
- Fixed the "Releases" entry in the docs nav 404'ing when no
unreleased.mdexists: theupdate_releases_navscript no longer hard-codes anunreleased: releases/unreleased.mdline inmkdocs.yml, instead only emitting it when the file is present on disk.
feature_extraction¶
- Added
feature_extraction.RandomTreesEmbedding, an online random-tree leaf embedding transformer for feeding sparse tree features into downstream models. Addresses #1386.
neural_net¶
- Deprecated
river.neural_net; importing it now emits aDeprecationWarningand users are encouraged to usedeep-riverfor neural networks. Addresses #1828.
drift¶
- Reimplemented
drift.ADWIN's innerAdaptiveWindowingin Rust. The Cython sources are removed; output is bit-identical to the Cython baseline (width, total, variance, n_detections, drift_detected) over a 3.8k-step parity fuzz. Rust is 1.3-3.5x faster than the previous Cython implementation acrossclocksettings.
compat¶
- Fixed
compat.SKL2RiverClassifier.predict_proba_manyraising aTypeErrorwhenever the wrapped estimator was already fitted: it incorrectly built apd.Series(..., columns=...)instead of apd.DataFrame. Test coverage previously only exercised the not-fitted branch.SKL2RiverClassifierandSKL2RiverRegressorare now also exercised by the generic estimator-check suite via_unit_test_params. - Fixed
compat.SKL2RiverClassifier._multiclassadvertising multi-class support unconditionally; it now reflectslen(classes) > 2.
misc¶
- Added
misc.ZstdClassifier, a compression-based text classifier that scores documents by the size of their zstd-compressed output under per-class prefix dictionaries built from a sliding byte window. Requires Python 3.14 (compression.zstd). See Zstd-based text classification.
metrics¶
-
Sped up
metrics.Silhouetteby switching the centroid distance computations from theutils.math.minkowski_distancePython wrapper to a direct call into the Rusteuclidean_distance_dict. -
Reimplemented the inner
expected_mutual_inforoutine (used bymetrics.AdjustedMutualInfo) in Rust. The Cython sources are removed and the new implementation is roughly twice as fast as the old one across all tested contingency-table sizes. - Reimplemented
metrics.RollingROCAUCandmetrics.RollingPRAUCin Rust. The C++ implementation is removed. Output is bit-identical to the C++ version on all tested inputs and a latent bug inrevert()with a non-defaultpos_valis also fixed.
utils¶
- Reimplemented
utils.VectorDict(and the helper functionseuclidean_distance_dict,euclidean_distance_tuple,lazy_search_euclidean) in Rust. The Cython sources are removed; the public API is unchanged. Element-wise operations are faster across the board:vec + scalarandvec * scalarare ~18% faster on 20-key dicts and ~14% faster on 1000-key dicts;vec + vecis 4-5% faster,vec @ vec(dot product) is 4-10% faster. The constructor and__setitem__are within 1-4% of the Cython baseline (~2 ns absolute, dominated by PyO3 object-allocation overhead).
anomaly¶
- Sped up
anomaly.HalfSpaceTrees.learn_oneandscore_oneby replacing the generic recursivetree.base.Branch.walktraversal with an iterative tight loop specialised for HST, caching the (constant)size_limitand tree height as locals, and pivoting node masses through a precomputed flat node list. Output is unchanged. On a synthetic 10-feature streamscore+learnis ~3.0× faster (27.9k → 85.2k obs/s),learn_one~2.6×, andscore_one~3.8×; in aMinMaxScaler | HalfSpaceTreespipeline on CreditCard the end-to-end pipeline is ~2.0× faster (20.5k → 40.5k obs/s).
cluster¶
- Sped up
cluster.DBSTREAMby replacing the per-cleanupcopy.deepcopyof the micro-cluster dict with an in-place pop, replacing thedeepcopyin the offline reclustering step with a direct micro-cluster construction, hoisting the Gaussian neighborhood factor out of the per-feature center update (it does not vary across dimensions), and folding the nestedtry/except KeyErrorshared-density update into a plaindict.get. Output is unchanged. On the 15k-sample synthetic-sklearn workload,learn_oneis ~6.1× faster (0.516 s → 0.084 s) andlearn_one + predict_oneis ~4.3× faster (0.872 s → 0.204 s). - Sped up
cluster.DenStream._mergeby replacing the speculativecopy.copy+ insert + radius check with a non-mutatingradius_with(x)that computes the would-be radius directly fromlinear_sum,squared_sumandN. Cached each micro-cluster's center (it reduces tolinear_sum / Nonce the fading factor is cancelled algebraically), and switched the per-candidate distance lookup in_get_closest_cluster_keyfrom theutils.math.minkowski_distancePython wrapper to a direct call into the Rusteuclidean_distance_dict. On a 20k-sample 10-feature synthetic stream,learn_oneis ~1.7× faster (6.4 µs/point → 3.8 µs/point). The change also fixes a latent shallow-copy bug: the previous code sharedlinear_sum/squared_sumbetween thecopy.copyand the original, so a failed radius check left the original cluster with the candidate point's contributions added in (without bumpingN). - Sped up
cluster.CluStream.learn_oneby caching each micro-cluster'scenterdict on the micro-cluster itself (invalidated oninsert/__iadd__), materializing the center list once at the top of_maintain_micro_clustersinstead of rebuilding it inside the n² pairwise scan, replacing the deepcopy-heavyVar.__add__calls inCluStreamMicroCluster.__iadd__with in-placeVar.__iadd__, and switching_distancefrom theutils.math.minkowski_distancePython wrapper to a direct call into the Rusteuclidean_distance_dict. The fix removes ~36M redundantcenterdict rebuilds and 367MMean.getcalls on a 5k-sample 10-feature synthetic stream. End-to-endlearn_oneis ~3.9× faster at d=10 (25.5 s → 6.5 s for 5k points), ~3.8× at d=20 and ~3.8× at d=50.
anomaly¶
- Sped up
api.anomaly.LocalOutlierFactorby replacing the defaultfunctools.partial([utils.math.minkowski_distance](../api/utils/math/minkowski-distance), p=2)distance function with a direct call into the Rusteuclidean_distance_dict, removing the Python-level dispatch.
cluster¶
- Sped up
cluster.STREAMKMeans.predict_oneby switching the per-center distance from theutils.math.minkowski_distancePython wrapper to a direct call into the Rusteuclidean_distance_dict.
preprocessing¶
- Sped up
preprocessing.OneHotEncoder.transform_oneby ~8x andlearn_one + transform_oneby ~5.5x (on 100k rows × 5 features with cardinality 20). The previous implementation rebuilt the all-zeros dict via{f"{i}_{v}": 0 ...}on every call; the encoder now maintains an incremental cache of that zero-dict andtransform_onecopies it instead of rebuilding. Output is unchanged. - Sped up
preprocessing.StandardScalerby ~15% onlearn_oneandlearn_one + transform_oneby hoisting theself.counts/self.means/self.varsdict references out of the inner loop, splitting thewith_std=Trueandwith_std=Falsepaths, and folding thesafe_divcall intransform_oneinto an inline branch (eliminating ~1M function calls per 100k samples × 10 features). The Welford update formula is unchanged. - Sped up
preprocessing.MinMaxScaler.transform_oneby ~1.3x by caching each feature'sself.min[i].get()andself.max[i].get()results in locals (previouslyself.min[i].get()was called twice per feature) and inliningsafe_div.preprocessing.MaxAbsScaler.transform_onebenefits from the samesafe_divinlining.learn_oneis also slightly faster thanks to hoistingself.min/self.max/self.abs_maxout of the loop..update()/.get()onstats.Min/stats.Max/stats.AbsMaxremain the only paths into those objects.
compose¶
- Sped up
compose.Pipelineend-to-end throughput by 1.3x–1.9x (e.g.scaler|lr7.4 µs → 5.7 µs/event,(sel+sel)|scaler|lr12.5 µs → 6.7 µs/event on TrumpApproval) by precomputing an execution plan (kind/_supervisedflags) for each step at construction time, eliminating per-eventisinstancechecks via theEstimatorMeta.__instancecheck__metaclass (~180k → 0 calls per 20k events) and repeated_supervisedproperty lookups. The plan is invalidated on_add_step. The lazy_anomaly_filter_cls/_anomaly_detector_clsimports are nowfunctools.cached. - Sped up
compose.TransformerUnion.transform_oneby replacing thedict(collections.ChainMap(*outputs))merge with a singledict.updateloop over reversed transformer outputs (~10x faster on the merge alone). Semantics are preserved (earlier transformers win on duplicate keys). - Sped up
compose.Prefixer/compose.Suffixertransform_oneby inlining the prefix/suffix concatenation in the dict comprehension instead of going through the_renamemethod on each key.
tree¶
- Fixed
MondrianNodeClassifier.replantnot copying thecountsattribute when promoting a leaf to a branch, leaving the new branch withn_samples != 0but empty class counts. The fix mirrors the regressor's_meancopy and matches the referenceonelearnimplementation. Addresses #1823. - Fixed Mondrian tree leaf nodes losing their bounding box ranges during splits. Previously, when a leaf was split, the new child nodes did not inherit the
memory_range_minandmemory_range_maxattributes, which caused incorrect range extension calculations. Fixes #1801 - Fixed
MondrianNodeClassifier.replantcopying min and max bounds by reference instead of by value during a split. The fix ensures these arrays are explicitly copied by value so the bounds are correctly preserved. Fixed #1834 - Skipped the expensive
range_extension_ccall for pure nodes in the Mondrian classifier's downward pass whensplit_pure=False(default). Benchmarks show ~3–5% speedup on datasets with 50+ features. - Reimplemented the Mondrian tree numerical helpers (
tree.mondrian._mondrian_ops) in Rust. The Cython sources are removed; the helpers are now exposed viariver.stats._rust_stats. Output matches the Cython baseline (Bananas accuracy unchanged at 70.64%). The leaf-to-root_go_upwardswalk and the predict tree-walk also moved into Rust as single FFI calls, eliminating ~360k Python frame setups per 20k-sample run. End-to-endMondrianTreeClassifierlearn+predict is ~28% faster (~23 µs/iter vs ~32 µs/iter Cython);MondrianTreeRegressoris ~21% faster (~31 µs/iter vs ~39 µs/iter) on a 20k-sample 10-feature synthetic stream.