EpsilonGreedyRegressor¶

Model selection based on the $\eps$ -greedy bandit strategy.

Performs model selection by using an $\eps$ -greedy bandit strategy. A model is selected for each learning step. The best model is selected (1 - $\eps$ %) of the time.

Selection bias is a common problem when using bandits for online model selection. This bias can be mitigated by using a burn-in phase. Each model is given the chance to learn during the first burn_in steps.

Parameters¶

models

The models to choose from.
metric – defaults to None

The metric that is used to compare models with each other. Defaults to metrics.MAE.
epsilon – defaults to 0.1

The fraction of time exploration is performed rather than exploitation.
decay – defaults to 0.0

Exponential factor at which epsilon decays.
burn_in – defaults to 100

The number of initial steps during which each model is updated.
seed (int) – defaults to None

Random number generator seed for reproducibility.

Attributes¶

best_model

The current best model.
burn_in
decay
epsilon
models
seed

Examples¶

>>> from river import datasets
>>> from river import evaluate
>>> from river import linear_model
>>> from river import metrics
>>> from river import model_selection
>>> from river import optim
>>> from river import preprocessing

>>> models = [
...     linear_model.LinearRegression(optimizer=optim.SGD(lr=lr))
...     for lr in [0.0001, 0.001, 1e-05, 0.01]
... ]

>>> dataset = datasets.TrumpApproval()
>>> model = (
...     preprocessing.StandardScaler() |
...     model_selection.EpsilonGreedyRegressor(
...         models,
...         epsilon=0.1,
...         decay=0.001,
...         burn_in=100,
...         seed=1
...     )
... )
>>> metric = metrics.MAE()

>>> evaluate.progressive_val_score(dataset, model, metric)
MAE: 1.363516

>>> model['EpsilonGreedyRegressor'].bandit
Ranking   MAE         Pulls   Share
     #2   15.850129     111    8.53%
     #1   13.060601     117    8.99%
     #3   16.519079     109    8.38%
     #0    1.387839     964   74.10%

>>> model['EpsilonGreedyRegressor'].best_model
LinearRegression (
  optimizer=SGD (
    lr=Constant (
      learning_rate=0.01
    )
  )
  loss=Squared ()
  l2=0.
  l1=0.
  intercept_init=0.
  intercept_lr=Constant (
    learning_rate=0.01
  )
  clip_gradient=1e+12
  initializer=Zeros ()
)

Methods¶

append

S.append(value) -- append value to the end of the sequence

Parameters

item

clear

S.clear() -> None -- remove all items from S

copy

count

S.count(value) -> integer -- return number of occurrences of value

Parameters

item

extend

S.extend(iterable) -- extend sequence by appending elements from the iterable

Parameters

other

index

S.index(value, [start, [stop]]) -> integer -- return first index of value. Raises ValueError if the value is not present.

Supporting start and stop arguments is optional, but recommended.

Parameters

item
args

insert

S.insert(index, value) -- insert value before index

Parameters

i
item

learn_one

Fits to a set of features x and a real-valued target y.

Parameters

x (dict)
y (numbers.Number)

Returns

Regressor: self

pop

S.pop([index]) -> item -- remove and return item at index (default last). Raise IndexError if list is empty or index is out of range.

Parameters

i – defaults to -1

predict_one

Predict the output of features x.

Parameters

x

Returns

The prediction.

remove

S.remove(value) -- remove first occurrence of value. Raise ValueError if the value is not present.

Parameters

item

reverse

S.reverse() -- reverse IN PLACE

sort

References¶

ε-Greedy Algorithm - The Multi-Armed Bandit Problem and Its Solutions - Lilian Weng ↩