RegressionJackknife¶

Jackknife method for regression.

This is a conformal prediction method for regression. It is based on the jackknife method. The idea is to compute the quantiles of the residuals of the regressor. The prediction interval is then computed as the prediction of the regressor plus the quantiles of the residuals.

This works naturally online, as the quantiles of the residuals are updated at each iteration. Each residual is produced before the regressor is updated, which ensures the predicted intervals are not optimistic.

Note that the produced intervals are marginal and not conditional. This means that the intervals are not adjusted for the features x. This is a limitation of the jackknife method. However, the jackknife method is very simple and efficient. It is also very robust to outliers.

Parameters¶

regressor

Type → base.Regressor

The regressor to be wrapped.
confidence_level

Type → float

Default → 0.95

The confidence level of the prediction intervals.
window_size

Type → int | None

Default → None

The size of the window used to compute the quantiles of the residuals. If None, the quantiles are computed over the whole history. It is advised to set this if you expect the model's performance to change over time.

Examples¶

from river import conf
from river import datasets
from river import linear_model
from river import metrics
from river import preprocessing
from river import stats

dataset = datasets.TrumpApproval()

model = conf.RegressionJackknife(
    (
        preprocessing.StandardScaler() |
        linear_model.LinearRegression(intercept_lr=.1)
    ),
    confidence_level=0.9
)

validity = stats.Mean()
efficiency = stats.Mean()

for x, y in dataset:
    interval = model.predict_one(x, with_interval=True)
    validity = validity.update(y in interval)
    efficiency = efficiency.update(interval.width)
    model = model.learn_one(x, y)

The interval's validity is the proportion of times the true value is within the interval. We specified a confidence level of 90%, so we expect the validity to be around 90%.

validity

Mean: 0.903097

The interval's efficiency is the average width of the intervals.

efficiency

Mean: 3.593173

Lowering the confidence lowering will mechanically improve the efficiency.

Methods¶

learn_one

Fits to a set of features x and a real-valued target y.

Parameters

x
y
kwargs

Returns

self

predict_one

Predict the output of features x.

Parameters

x
with_interval — defaults to False
kwargs

Returns

The prediction.

Barber, Rina Foygel, Emmanuel J. Candes, Aaditya Ramdas, and Ryan J. Tibshirani. "Predictive inference with the jackknife+." The Annals of Statistics 49, no. 1 (2021): 486-507. ↩