Skip to content

LinUCBDisjoint

LinUCB, disjoint variant.

Although it works, as of yet it is too slow to realistically be used in practice.

The way this works is that each arm is assigned a linear_model.BayesianLinearRegression instance. This instance is updated every time the arm is pulled. The context is used as features for the regression. The reward is used as the target. The posterior distribution is used to compute the upper confidence bound. The arm with the highest upper confidence bound is pulled.

Parameters

  • alpha

    Typefloat

    Default1.0

    Parameter used in each Bayesian linear regression.

  • beta

    Typefloat

    Default1.0

    Parameter used in each Bayesian linear regression.

  • smoothing

    Typefloat | None

    DefaultNone

    Parameter used in each Bayesian linear regression.

  • reward_obj

    DefaultNone

    The reward object used to measure the performance of each arm.

  • burn_in

    Default0

    The number of time steps during which each arm is pulled once.

  • seed

    Typeint | None

    DefaultNone

    Random number generator seed for reproducibility.

Attributes

  • ranking

    Return the list of arms in descending order of performance.

Methods

pull

Pull arm(s).

This method is a generator that yields the arm(s) that should be pulled. During the burn-in phase, all the arms that have not been pulled enough times are yielded. Once the burn-in phase is over, the policy is allowed to choose the arm(s) that should be pulled. If you only want to pull one arm at a time during the burn-in phase, simply call next(policy.pull(arms)).

Parameters

  • arm_ids'list[ArmID]'
  • context'dict' — defaults to None

Returns

ArmID: A single arm.

update

Rewrite update function

Parameters

  • arm_id
  • context
  • reward_args
  • reward_kwargs