UCBRegressor¶

Upper Confidence Bound bandit for regression.

The class offers 2 implementations of UCB:

UCB1 from ¹, when the parameter delta has value None - UCB(delta) from ², when the parameter delta is in (0, 1)

For this bandit, rewards are supposed to be 1-subgaussian (see Lattimore and Szepesvári, chapter 6, p. 91) hence the use of the StandardScaler and MaxAbsScaler as reward_scaler.

Parameters¶

models (List[base.Estimator])

The models to compare.
metric (river.metrics.base.RegressionMetric) – defaults to None

Metric used for comparing models with.
delta (float) – defaults to None

For UCB(delta) implementation. Lower value means more exploration.
explore_each_arm (int) – defaults to 1

The number of times each arm should explored first.
start_after (int) – defaults to 20

The number of iteration after which the bandit mechanism should begin.
seed (int) – defaults to None

The seed for the algorithm (since not deterministic).

Attributes¶

best_model

Returns the best model, defined as the one who maximises average reward.
percentage_pulled

Returns the number of times (in %) each arm has been pulled.

Examples¶

Let's use UCBRegressor to select the best learning rate for a linear regression model. First, we define the grid of models:

>>> from river import compose
>>> from river import linear_model
>>> from river import preprocessing
>>> from river import optim

>>> models = [
...     compose.Pipeline(
...         preprocessing.StandardScaler(),
...         linear_model.LinearRegression(optimizer=optim.SGD(lr=lr))
...     )
...     for lr in [1e-4, 1e-3, 1e-2, 1e-1]
... ]

We decide to use TrumpApproval dataset:

>>> from river import datasets
>>> dataset = datasets.TrumpApproval()

We use the UCB bandit:

>>> from river.expert import UCBRegressor
>>> bandit = UCBRegressor(models=models, seed=1)

The models in the bandit can be trained in an online fashion.

>>> for x, y in dataset:
...     bandit = bandit.learn_one(x=x, y=y)

We can inspect the number of times (in percentage) each arm has been pulled.

>>> for model, pct in zip(bandit.models, bandit.percentage_pulled):
...     lr = model["LinearRegression"].optimizer.learning_rate
...     print(f"{lr:.1e} — {pct:.2%}")
1.0e-04 — 2.45%
1.0e-03 — 2.45%
1.0e-02 — 92.25%
1.0e-01 — 2.85%

The average reward of each model is also available:

>>> for model, avg in zip(bandit.models, bandit.average_reward):
...     lr = model["LinearRegression"].optimizer.learning_rate
...     print(f"{lr:.1e} — {avg:.2f}")
1.0e-04 — 0.00
1.0e-03 — 0.00
1.0e-02 — 0.74
1.0e-01 — 0.05

We can also select the best model (the one with the highest average reward).

>>> best_model = bandit.best_model

The learning rate chosen by the bandit is:

>>> best_model["LinearRegression"].intercept_lr.learning_rate
0.01

Methods¶

add_models

clone

Return a fresh estimator with the same parameters.

The clone has the same parameters but has not been updated with any data. This works by looking at the parameters from the class signature. Each parameter is either - recursively cloned if it's a River classes. - deep-copied via copy.deepcopy if not. If the calling object is stochastic (i.e. it accepts a seed parameter) and has not been seeded, then the clone will not be idempotent. Indeed, this method's purpose if simply to return a new instance with the same input parameters.

learn_one

Updates the chosen model and the arm internals (the actual implementation is in Bandit._learn_one).

Parameters

x
y

predict_one

Return the prediction of the best model (defined as the one who maximises average reward).