LinUCBDisjoint¶
LinUCB, disjoint variant.
Although it works, as of yet it is too slow to realistically be used in practice.
The way this works is that each arm is assigned a linear_model.BayesianLinearRegression
instance. This instance is updated every time the arm is pulled. The context is used as features for the regression. The reward is used as the target. The posterior distribution is used to compute the upper confidence bound. The arm with the highest upper confidence bound is pulled.
Parameters¶
-
alpha
Type → float
Default →
1.0
Parameter used in each Bayesian linear regression.
-
beta
Type → float
Default →
1.0
Parameter used in each Bayesian linear regression.
-
smoothing
Type → float | None
Default →
None
Parameter used in each Bayesian linear regression.
-
reward_obj
Default →
None
The reward object used to measure the performance of each arm.
-
burn_in
Default →
0
The number of time steps during which each arm is pulled once.
-
seed
Type → int | None
Default →
None
Random number generator seed for reproducibility.
Attributes¶
-
ranking
Return the list of arms in descending order of performance.
Methods¶
pull
Pull arm(s).
This method is a generator that yields the arm(s) that should be pulled. During the burn-in phase, all the arms that have not been pulled enough times are yielded. Once the burn-in phase is over, the policy is allowed to choose the arm(s) that should be pulled. If you only want to pull one arm at a time during the burn-in phase, simply call next(policy.pull(arms))
.
Parameters
- arm_ids — 'list[ArmID]'
- context — 'dict' — defaults to
None
Returns
ArmID: A single arm.
update
Rewrite update function
Parameters
- arm_id
- context
- reward_args
- reward_kwargs