UCBRegressor¶
Upper Confidence Bound bandit for regression.
The class offers 2 implementations of UCB:
- UCB1 from 1, when the parameter delta has value None - UCB(delta) from 2, when the parameter delta is in (0, 1)
For this bandit, rewards are supposed to be 1-subgaussian (see Lattimore and Szepesvári, chapter 6, p. 91) hence the use of the StandardScaler
and MaxAbsScaler
as reward_scaler
.
Parameters¶
-
models (List[base.Estimator])
The models to compare.
-
metric (river.metrics.base.RegressionMetric) – defaults to
None
Metric used for comparing models with.
-
delta (float) – defaults to
None
For UCB(delta) implementation. Lower value means more exploration.
-
explore_each_arm (int) – defaults to
1
The number of times each arm should explored first.
-
start_after (int) – defaults to
20
The number of iteration after which the bandit mechanism should begin.
-
seed (int) – defaults to
None
The seed for the algorithm (since not deterministic).
Attributes¶
-
best_model
Returns the best model, defined as the one who maximises average reward.
-
percentage_pulled
Returns the number of times (in %) each arm has been pulled.
Examples¶
Let's use UCBRegressor
to select the best learning rate for a linear regression model. First,
we define the grid of models:
>>> from river import compose
>>> from river import linear_model
>>> from river import preprocessing
>>> from river import optim
>>> models = [
... compose.Pipeline(
... preprocessing.StandardScaler(),
... linear_model.LinearRegression(optimizer=optim.SGD(lr=lr))
... )
... for lr in [1e-4, 1e-3, 1e-2, 1e-1]
... ]
We decide to use TrumpApproval dataset:
>>> from river import datasets
>>> dataset = datasets.TrumpApproval()
We use the UCB bandit:
>>> from river.expert import UCBRegressor
>>> bandit = UCBRegressor(models=models, seed=1)
The models in the bandit can be trained in an online fashion.
>>> for x, y in dataset:
... bandit = bandit.learn_one(x=x, y=y)
We can inspect the number of times (in percentage) each arm has been pulled.
>>> for model, pct in zip(bandit.models, bandit.percentage_pulled):
... lr = model["LinearRegression"].optimizer.learning_rate
... print(f"{lr:.1e} — {pct:.2%}")
1.0e-04 — 2.45%
1.0e-03 — 2.45%
1.0e-02 — 92.25%
1.0e-01 — 2.85%
The average reward of each model is also available:
>>> for model, avg in zip(bandit.models, bandit.average_reward):
... lr = model["LinearRegression"].optimizer.learning_rate
... print(f"{lr:.1e} — {avg:.2f}")
1.0e-04 — 0.00
1.0e-03 — 0.00
1.0e-02 — 0.74
1.0e-01 — 0.05
We can also select the best model (the one with the highest average reward).
>>> best_model = bandit.best_model
The learning rate chosen by the bandit is:
>>> best_model["LinearRegression"].intercept_lr.learning_rate
0.01
Methods¶
add_models
clone
Return a fresh estimator with the same parameters.
The clone has the same parameters but has not been updated with any data. This works by looking at the parameters from the class signature. Each parameter is either - recursively cloned if it's a River classes. - deep-copied via copy.deepcopy
if not. If the calling object is stochastic (i.e. it accepts a seed parameter) and has not been seeded, then the clone will not be idempotent. Indeed, this method's purpose if simply to return a new instance with the same input parameters.
learn_one
Updates the chosen model and the arm internals (the actual implementation is in Bandit._learn_one).
Parameters
- x
- y
predict_one
Return the prediction of the best model (defined as the one who maximises average reward).
Parameters
- x