Skip to content

Multi-armed bandits

River has a bandit module. It contains several multi-armed bandit policies, bandit environments, and utilities to benchmark policies on bandit problems.

Bandit environments in River implement the Gym interface. You can thus load them with gym.make. Note that Gym is intended for reinforcement learning algorithms, while bandit policies are the simplest form of reinforcement learning. Bandit policies learn by receiving a reward after each step, while reinforcement learning algorithms have to learn from feedback that may arrive at the end of a (long) sequence of steps.

import gymnasium as gym

for k in gym.envs.registry:
    if k.startswith('river_bandits'):
        print(k)

River's bandit module offers the bandit.evaluate function to benchmark several policies on a given environment. It takes as input a list of bandit policies, a bandit environment (the problem to solve), and a reward object.

import gymnasium as gym
from river import bandit
import pandas as pd
from tqdm import tqdm
from river import stats

policies=[
    bandit.EpsilonGreedy(epsilon=0.1),
    bandit.EpsilonGreedy(epsilon=0.01),
    bandit.EpsilonGreedy(epsilon=0),
]

env = gym.make(
    'river_bandits/KArmedTestbed-v0',
    max_episode_steps=1000
)

trace = bandit.evaluate(
    policies=policies,
    env=env,
    reward_stat=stats.Mean(),
    n_episodes=(n_episodes := 2000),
)

The bandit.evaluate function returns a generator containing the results at each step of the benchmark. This can be wrapped with a pandas.DataFrame to gather all the results.

trace_df = pd.DataFrame(tqdm(
    trace, position=0, total=(
        n_episodes *
        len(policies) *
        env._max_episode_steps
    )
))
trace_df.sample(5, random_state=42)
0%|          | 0/6000000 [00:00<?, ?it/s]
0%|          | 16296/6000000 [00:00<00:36, 162952.39it/s]
1%|          | 33279/6000000 [00:00<00:35, 166990.80it/s]
1%|          | 50382/6000000 [00:00<00:35, 168832.07it/s]
1%|          | 67266/6000000 [00:00<00:35, 168776.01it/s]
1%|▏         | 84144/6000000 [00:00<00:35, 167627.15it/s]
2%|▏         | 101250/6000000 [00:00<00:34, 168787.02it/s]
2%|▏         | 118131/6000000 [00:00<00:34, 168394.84it/s]
2%|▏         | 134972/6000000 [00:00<00:44, 130684.45it/s]
3%|▎         | 151764/6000000 [00:00<00:41, 140368.53it/s]
3%|▎         | 168453/6000000 [00:01<00:39, 147551.69it/s]
3%|▎         | 185728/6000000 [00:01<00:37, 154582.30it/s]
3%|▎         | 202484/6000000 [00:01<00:36, 158278.27it/s]
4%|▎         | 219375/6000000 [00:01<00:35, 161353.13it/s]
4%|▍         | 236492/6000000 [00:01<00:35, 164222.62it/s]
4%|▍         | 253171/6000000 [00:01<00:34, 164977.25it/s]
5%|▍         | 270004/6000000 [00:01<00:34, 165945.01it/s]
5%|▍         | 286955/6000000 [00:01<00:34, 167004.23it/s]
5%|▌         | 303743/6000000 [00:01<00:34, 166791.03it/s]
5%|▌         | 320483/6000000 [00:01<00:34, 166858.77it/s]
6%|▌         | 337212/6000000 [00:02<00:33, 166733.36it/s]
6%|▌         | 354086/6000000 [00:02<00:33, 167329.46it/s]
6%|▌         | 371173/6000000 [00:02<00:33, 168387.57it/s]
6%|▋         | 388027/6000000 [00:02<00:33, 168308.80it/s]
7%|▋         | 405001/6000000 [00:02<00:33, 168164.17it/s]
7%|▋         | 422237/6000000 [00:02<00:32, 169416.21it/s]
7%|▋         | 439185/6000000 [00:02<00:32, 169119.39it/s]
8%|▊         | 456102/6000000 [00:02<00:32, 168213.86it/s]
8%|▊         | 473359/6000000 [00:02<00:32, 169509.59it/s]
8%|▊         | 490314/6000000 [00:03<00:32, 168883.14it/s]
8%|▊         | 507205/6000000 [00:03<00:32, 168451.02it/s]
9%|▊         | 524652/6000000 [00:03<00:32, 170243.58it/s]
9%|▉         | 541679/6000000 [00:03<00:32, 169711.36it/s]
9%|▉         | 558652/6000000 [00:03<00:32, 169560.95it/s]
10%|▉         | 575976/6000000 [00:03<00:31, 170658.63it/s]
10%|▉         | 593044/6000000 [00:03<00:31, 170214.96it/s]
10%|█         | 610067/6000000 [00:03<00:31, 169269.79it/s]
10%|█         | 627001/6000000 [00:03<00:31, 169038.31it/s]
11%|█         | 643906/6000000 [00:03<00:32, 163915.88it/s]
11%|█         | 660586/6000000 [00:04<00:32, 164757.03it/s]
11%|█▏        | 677456/6000000 [00:04<00:32, 165916.34it/s]
12%|█▏        | 694251/6000000 [00:04<00:31, 166517.85it/s]
12%|█▏        | 711011/6000000 [00:04<00:31, 166838.48it/s]
12%|█▏        | 727706/6000000 [00:04<00:32, 162137.07it/s]
12%|█▏        | 744455/6000000 [00:04<00:32, 163701.61it/s]
13%|█▎        | 761211/6000000 [00:04<00:31, 164837.43it/s]
13%|█▎        | 777717/6000000 [00:04<00:31, 164765.60it/s]
13%|█▎        | 794672/6000000 [00:04<00:31, 166185.22it/s]
14%|█▎        | 811303/6000000 [00:04<00:31, 166098.28it/s]
14%|█▍        | 828001/6000000 [00:05<00:31, 165956.65it/s]
14%|█▍        | 844911/6000000 [00:05<00:30, 166890.75it/s]
14%|█▍        | 861605/6000000 [00:05<00:30, 166630.15it/s]
15%|█▍        | 878515/6000000 [00:05<00:30, 167365.05it/s]
15%|█▍        | 895254/6000000 [00:05<00:30, 167019.89it/s]
15%|█▌        | 911958/6000000 [00:05<00:30, 166600.64it/s]
15%|█▌        | 928620/6000000 [00:05<00:30, 166018.96it/s]
16%|█▌        | 945224/6000000 [00:05<00:30, 165944.18it/s]
16%|█▌        | 962422/6000000 [00:05<00:30, 167746.50it/s]
16%|█▋        | 979198/6000000 [00:05<00:30, 167189.09it/s]
17%|█▋        | 996141/6000000 [00:06<00:29, 167856.93it/s]
17%|█▋        | 1013266/6000000 [00:06<00:29, 168868.53it/s]
17%|█▋        | 1030154/6000000 [00:06<00:29, 167926.39it/s]
17%|█▋        | 1047214/6000000 [00:06<00:29, 168723.08it/s]
18%|█▊        | 1064181/6000000 [00:06<00:29, 169004.32it/s]
18%|█▊        | 1081083/6000000 [00:06<00:29, 168582.03it/s]
18%|█▊        | 1098001/6000000 [00:06<00:29, 168586.37it/s]
19%|█▊        | 1115119/6000000 [00:06<00:28, 169360.99it/s]
19%|█▉        | 1132056/6000000 [00:06<00:28, 169046.95it/s]
19%|█▉        | 1149003/6000000 [00:06<00:28, 169127.35it/s]
19%|█▉        | 1166323/6000000 [00:07<00:28, 170343.54it/s]
20%|█▉        | 1183358/6000000 [00:07<00:28, 169822.95it/s]
20%|██        | 1200341/6000000 [00:07<00:28, 169310.66it/s]
20%|██        | 1217483/6000000 [00:07<00:28, 169939.51it/s]
21%|██        | 1234478/6000000 [00:07<00:28, 168641.96it/s]
21%|██        | 1251345/6000000 [00:07<00:28, 168441.58it/s]
21%|██        | 1268191/6000000 [00:07<00:28, 168354.73it/s]
21%|██▏       | 1285028/6000000 [00:07<00:28, 168017.49it/s]
22%|██▏       | 1302001/6000000 [00:07<00:27, 168424.20it/s]
22%|██▏       | 1319097/6000000 [00:07<00:27, 169178.89it/s]
22%|██▏       | 1336016/6000000 [00:08<00:27, 167933.71it/s]
23%|██▎       | 1352826/6000000 [00:08<00:27, 167980.19it/s]
23%|██▎       | 1369626/6000000 [00:08<00:27, 167003.18it/s]
23%|██▎       | 1386329/6000000 [00:08<00:27, 166644.38it/s]
23%|██▎       | 1403453/6000000 [00:08<00:27, 168012.38it/s]
24%|██▎       | 1420257/6000000 [00:08<00:27, 167919.67it/s]
24%|██▍       | 1437051/6000000 [00:08<00:27, 167250.79it/s]
24%|██▍       | 1453953/6000000 [00:08<00:27, 167776.91it/s]
25%|██▍       | 1470732/6000000 [00:08<00:27, 166807.59it/s]
25%|██▍       | 1487486/6000000 [00:08<00:27, 167024.22it/s]
25%|██▌       | 1504190/6000000 [00:09<00:26, 166642.37it/s]
25%|██▌       | 1521001/6000000 [00:09<00:26, 166991.99it/s]
26%|██▌       | 1538194/6000000 [00:09<00:26, 168466.20it/s]
26%|██▌       | 1555042/6000000 [00:09<00:26, 168095.43it/s]
26%|██▌       | 1571892/6000000 [00:09<00:26, 168215.71it/s]
26%|██▋       | 1588715/6000000 [00:09<00:27, 162167.31it/s]
27%|██▋       | 1605225/6000000 [00:09<00:26, 163020.69it/s]
27%|██▋       | 1622272/6000000 [00:09<00:26, 165210.98it/s]
27%|██▋       | 1639020/6000000 [00:09<00:26, 165879.62it/s]
28%|██▊       | 1655977/6000000 [00:09<00:26, 166974.26it/s]
28%|██▊       | 1672691/6000000 [00:10<00:26, 166139.23it/s]
28%|██▊       | 1689317/6000000 [00:10<00:25, 166043.77it/s]
28%|██▊       | 1706140/6000000 [00:10<00:25, 166692.74it/s]
29%|██▊       | 1723103/6000000 [00:10<00:25, 167568.43it/s]
29%|██▉       | 1739982/6000000 [00:10<00:25, 167930.79it/s]
29%|██▉       | 1756779/6000000 [00:10<00:25, 167703.09it/s]
30%|██▉       | 1773552/6000000 [00:10<00:25, 166813.51it/s]
30%|██▉       | 1790431/6000000 [00:10<00:25, 167402.03it/s]
30%|███       | 1807174/6000000 [00:10<00:25, 166816.53it/s]
30%|███       | 1823928/6000000 [00:10<00:25, 167031.66it/s]
31%|███       | 1840633/6000000 [00:11<00:25, 166277.74it/s]
31%|███       | 1857626/6000000 [00:11<00:24, 167365.15it/s]
31%|███       | 1874745/6000000 [00:11<00:24, 168505.98it/s]
32%|███▏      | 1891598/6000000 [00:11<00:24, 168358.79it/s]
32%|███▏      | 1908435/6000000 [00:11<00:24, 168082.45it/s]
32%|███▏      | 1925482/6000000 [00:11<00:24, 168795.29it/s]
32%|███▏      | 1942363/6000000 [00:11<00:24, 168304.58it/s]
33%|███▎      | 1959195/6000000 [00:11<00:24, 168033.78it/s]
33%|███▎      | 1976508/6000000 [00:11<00:23, 169555.23it/s]
33%|███▎      | 1993465/6000000 [00:11<00:23, 169478.93it/s]
34%|███▎      | 2010414/6000000 [00:12<00:23, 168733.56it/s]
34%|███▍      | 2027495/6000000 [00:12<00:23, 169351.88it/s]
34%|███▍      | 2044432/6000000 [00:12<00:23, 168129.53it/s]
34%|███▍      | 2061248/6000000 [00:12<00:23, 167116.32it/s]
35%|███▍      | 2077962/6000000 [00:12<00:23, 166447.64it/s]
35%|███▍      | 2094609/6000000 [00:12<00:23, 166189.17it/s]
35%|███▌      | 2111507/6000000 [00:12<00:23, 167018.93it/s]
35%|███▌      | 2128211/6000000 [00:12<00:23, 166427.03it/s]
36%|███▌      | 2144997/6000000 [00:12<00:23, 166851.77it/s]
36%|███▌      | 2161803/6000000 [00:12<00:22, 167211.84it/s]
36%|███▋      | 2178526/6000000 [00:13<00:22, 167135.33it/s]
37%|███▋      | 2195297/6000000 [00:13<00:22, 167306.63it/s]
37%|███▋      | 2212029/6000000 [00:13<00:22, 166462.67it/s]
37%|███▋      | 2228811/6000000 [00:13<00:22, 166865.94it/s]
37%|███▋      | 2245598/6000000 [00:13<00:22, 167163.69it/s]
38%|███▊      | 2262329/6000000 [00:13<00:22, 167204.19it/s]
38%|███▊      | 2279401/6000000 [00:13<00:22, 168254.09it/s]
38%|███▊      | 2296227/6000000 [00:13<00:22, 168015.85it/s]
39%|███▊      | 2313030/6000000 [00:13<00:21, 167889.24it/s]
39%|███▉      | 2329950/6000000 [00:13<00:21, 168279.76it/s]
39%|███▉      | 2346779/6000000 [00:14<00:21, 167274.38it/s]
39%|███▉      | 2363805/6000000 [00:14<00:21, 168164.59it/s]
40%|███▉      | 2380623/6000000 [00:14<00:21, 167351.76it/s]
40%|███▉      | 2397373/6000000 [00:14<00:21, 167394.35it/s]
40%|████      | 2414364/6000000 [00:14<00:21, 168143.80it/s]
41%|████      | 2431180/6000000 [00:14<00:21, 167509.69it/s]
41%|████      | 2447949/6000000 [00:14<00:21, 167560.95it/s]
41%|████      | 2464706/6000000 [00:14<00:21, 167242.75it/s]
41%|████▏     | 2481642/6000000 [00:14<00:20, 167874.81it/s]
42%|████▏     | 2498608/6000000 [00:14<00:20, 168408.08it/s]
42%|████▏     | 2515450/6000000 [00:15<00:20, 167618.74it/s]
42%|████▏     | 2532214/6000000 [00:15<00:20, 167273.44it/s]
42%|████▏     | 2549233/6000000 [00:15<00:20, 168141.50it/s]
43%|████▎     | 2566049/6000000 [00:15<00:20, 167267.04it/s]
43%|████▎     | 2583001/6000000 [00:15<00:20, 167535.26it/s]
43%|████▎     | 2599932/6000000 [00:15<00:20, 168063.54it/s]
44%|████▎     | 2616740/6000000 [00:15<00:20, 167517.70it/s]
44%|████▍     | 2633804/6000000 [00:15<00:19, 168446.95it/s]
44%|████▍     | 2650650/6000000 [00:15<00:19, 168305.29it/s]
44%|████▍     | 2667482/6000000 [00:15<00:19, 168194.25it/s]
45%|████▍     | 2684323/6000000 [00:16<00:19, 168256.33it/s]
45%|████▌     | 2701150/6000000 [00:16<00:19, 168124.96it/s]
45%|████▌     | 2718037/6000000 [00:16<00:19, 168345.58it/s]
46%|████▌     | 2735142/6000000 [00:16<00:19, 169154.30it/s]
46%|████▌     | 2752058/6000000 [00:16<00:19, 168568.95it/s]
46%|████▌     | 2769001/6000000 [00:16<00:19, 168298.02it/s]
46%|████▋     | 2785981/6000000 [00:16<00:19, 168745.38it/s]
47%|████▋     | 2802857/6000000 [00:16<00:19, 168008.70it/s]
47%|████▋     | 2820001/6000000 [00:16<00:18, 168402.38it/s]
47%|████▋     | 2836930/6000000 [00:17<00:18, 168664.29it/s]
48%|████▊     | 2853797/6000000 [00:17<00:18, 167613.19it/s]
48%|████▊     | 2870820/6000000 [00:17<00:18, 168390.63it/s]
48%|████▊     | 2887712/6000000 [00:17<00:18, 168545.90it/s]
48%|████▊     | 2904568/6000000 [00:17<00:18, 168425.82it/s]
49%|████▊     | 2921689/6000000 [00:17<00:18, 169257.93it/s]
49%|████▉     | 2938616/6000000 [00:17<00:18, 168718.76it/s]
49%|████▉     | 2955489/6000000 [00:17<00:18, 168464.08it/s]
50%|████▉     | 2972662/6000000 [00:17<00:17, 169438.46it/s]
50%|████▉     | 2989607/6000000 [00:17<00:17, 168562.80it/s]
50%|█████     | 3006465/6000000 [00:18<00:17, 168549.28it/s]
50%|█████     | 3023412/6000000 [00:18<00:17, 168823.38it/s]
51%|█████     | 3040296/6000000 [00:18<00:17, 168036.61it/s]
51%|█████     | 3057101/6000000 [00:18<00:17, 167618.57it/s]
51%|█████     | 3074167/6000000 [00:18<00:17, 168524.92it/s]
52%|█████▏    | 3091021/6000000 [00:18<00:17, 167787.27it/s]
52%|█████▏    | 3108003/6000000 [00:18<00:17, 168364.90it/s]
52%|█████▏    | 3124896/6000000 [00:18<00:17, 168531.62it/s]
52%|█████▏    | 3141751/6000000 [00:18<00:16, 168281.29it/s]
53%|█████▎    | 3158703/6000000 [00:18<00:16, 168648.07it/s]
53%|█████▎    | 3175569/6000000 [00:19<00:16, 168641.22it/s]
53%|█████▎    | 3192434/6000000 [00:19<00:16, 168446.38it/s]
53%|█████▎    | 3209551/6000000 [00:19<00:16, 169260.13it/s]
54%|█████▍    | 3226492/6000000 [00:19<00:16, 169303.99it/s]
54%|█████▍    | 3243423/6000000 [00:19<00:16, 168594.32it/s]
54%|█████▍    | 3260614/6000000 [00:19<00:16, 169583.79it/s]
55%|█████▍    | 3277574/6000000 [00:19<00:16, 169189.87it/s]
55%|█████▍    | 3294494/6000000 [00:19<00:16, 168503.75it/s]
55%|█████▌    | 3311377/6000000 [00:19<00:15, 168599.12it/s]
55%|█████▌    | 3328305/6000000 [00:19<00:15, 168800.64it/s]
56%|█████▌    | 3345186/6000000 [00:20<00:15, 168198.36it/s]
56%|█████▌    | 3362198/6000000 [00:20<00:15, 168769.40it/s]
56%|█████▋    | 3379076/6000000 [00:20<00:15, 168363.83it/s]
57%|█████▋    | 3396039/6000000 [00:20<00:15, 168740.11it/s]
57%|█████▋    | 3413219/6000000 [00:20<00:15, 169653.36it/s]
57%|█████▋    | 3430185/6000000 [00:20<00:15, 168998.34it/s]
57%|█████▋    | 3447086/6000000 [00:20<00:15, 168895.76it/s]
58%|█████▊    | 3464094/6000000 [00:20<00:14, 169246.66it/s]
58%|█████▊    | 3481020/6000000 [00:20<00:14, 168376.21it/s]
58%|█████▊    | 3497960/6000000 [00:20<00:14, 168679.15it/s]
59%|█████▊    | 3514855/6000000 [00:21<00:14, 168742.96it/s]
59%|█████▉    | 3531731/6000000 [00:21<00:14, 168227.99it/s]
59%|█████▉    | 3548649/6000000 [00:21<00:14, 168510.58it/s]
59%|█████▉    | 3565501/6000000 [00:21<00:14, 167897.09it/s]
60%|█████▉    | 3582292/6000000 [00:21<00:14, 167336.72it/s]
60%|█████▉    | 3599574/6000000 [00:21<00:14, 168971.09it/s]
60%|██████    | 3616473/6000000 [00:21<00:14, 168355.09it/s]
61%|██████    | 3633463/6000000 [00:21<00:14, 168814.96it/s]
61%|██████    | 3650601/6000000 [00:21<00:13, 169580.33it/s]
61%|██████    | 3667579/6000000 [00:21<00:13, 169636.73it/s]
61%|██████▏   | 3684544/6000000 [00:22<00:13, 169240.45it/s]
62%|██████▏   | 3701813/6000000 [00:22<00:13, 170271.59it/s]
62%|██████▏   | 3718874/6000000 [00:22<00:13, 170371.01it/s]
62%|██████▏   | 3735912/6000000 [00:22<00:13, 170029.23it/s]
63%|██████▎   | 3753001/6000000 [00:22<00:13, 169550.26it/s]
63%|██████▎   | 3770133/6000000 [00:22<00:13, 170077.50it/s]
63%|██████▎   | 3787142/6000000 [00:22<00:13, 169359.91it/s]
63%|██████▎   | 3804079/6000000 [00:22<00:13, 168598.51it/s]
64%|██████▎   | 3821286/6000000 [00:22<00:12, 169631.53it/s]
64%|██████▍   | 3838251/6000000 [00:22<00:12, 168794.96it/s]
64%|██████▍   | 3855133/6000000 [00:23<00:12, 168557.01it/s]
65%|██████▍   | 3872012/6000000 [00:23<00:12, 168624.99it/s]
65%|██████▍   | 3888876/6000000 [00:23<00:12, 168225.93it/s]
65%|██████▌   | 3906001/6000000 [00:23<00:12, 168957.89it/s]
65%|██████▌   | 3923158/6000000 [00:23<00:12, 169735.86it/s]
66%|██████▌   | 3940133/6000000 [00:23<00:12, 169045.60it/s]
66%|██████▌   | 3957056/6000000 [00:23<00:12, 169099.74it/s]
66%|██████▌   | 3974318/6000000 [00:23<00:11, 170151.23it/s]
67%|██████▋   | 3991334/6000000 [00:23<00:11, 170112.63it/s]
67%|██████▋   | 4008346/6000000 [00:23<00:11, 169626.95it/s]
67%|██████▋   | 4025478/6000000 [00:24<00:11, 170126.58it/s]
67%|██████▋   | 4042492/6000000 [00:24<00:11, 169504.45it/s]
68%|██████▊   | 4059454/6000000 [00:24<00:11, 169536.72it/s]
68%|██████▊   | 4076598/6000000 [00:24<00:11, 170105.34it/s]
68%|██████▊   | 4093610/6000000 [00:24<00:11, 169041.72it/s]
69%|██████▊   | 4110516/6000000 [00:24<00:11, 168278.54it/s]
69%|██████▉   | 4127720/6000000 [00:24<00:11, 169396.28it/s]
69%|██████▉   | 4144662/6000000 [00:24<00:10, 169092.49it/s]
69%|██████▉   | 4161573/6000000 [00:24<00:10, 168800.76it/s]
70%|██████▉   | 4178629/6000000 [00:24<00:10, 169323.24it/s]
70%|██████▉   | 4195563/6000000 [00:25<00:10, 169041.36it/s]
70%|███████   | 4212757/6000000 [00:25<00:10, 169905.68it/s]
70%|███████   | 4229826/6000000 [00:25<00:10, 170138.14it/s]
71%|███████   | 4246841/6000000 [00:25<00:10, 169736.46it/s]
71%|███████   | 4263816/6000000 [00:25<00:10, 169357.06it/s]
71%|███████▏  | 4280861/6000000 [00:25<00:10, 169681.03it/s]
72%|███████▏  | 4297830/6000000 [00:25<00:10, 169630.88it/s]
72%|███████▏  | 4314935/6000000 [00:25<00:09, 170053.02it/s]
72%|███████▏  | 4332001/6000000 [00:25<00:09, 169939.59it/s]
72%|███████▏  | 4349074/6000000 [00:25<00:09, 170173.57it/s]
73%|███████▎  | 4366092/6000000 [00:26<00:09, 170010.18it/s]
73%|███████▎  | 4383094/6000000 [00:26<00:09, 169277.60it/s]
73%|███████▎  | 4400401/6000000 [00:26<00:09, 170409.18it/s]
74%|███████▎  | 4417443/6000000 [00:26<00:09, 169963.28it/s]
74%|███████▍  | 4434441/6000000 [00:26<00:09, 169580.63it/s]
74%|███████▍  | 4451506/6000000 [00:26<00:09, 169898.40it/s]
74%|███████▍  | 4468497/6000000 [00:26<00:09, 169628.07it/s]
75%|███████▍  | 4485495/6000000 [00:26<00:08, 169730.81it/s]
75%|███████▌  | 4502605/6000000 [00:26<00:08, 170140.06it/s]
75%|███████▌  | 4519620/6000000 [00:26<00:08, 169331.60it/s]
76%|███████▌  | 4536562/6000000 [00:27<00:08, 169355.88it/s]
76%|███████▌  | 4553499/6000000 [00:27<00:08, 169244.42it/s]
76%|███████▌  | 4570424/6000000 [00:27<00:08, 168553.94it/s]
76%|███████▋  | 4587281/6000000 [00:27<00:08, 167823.58it/s]
77%|███████▋  | 4604276/6000000 [00:27<00:08, 168456.29it/s]
77%|███████▋  | 4621123/6000000 [00:27<00:08, 168199.82it/s]
77%|███████▋  | 4638019/6000000 [00:27<00:08, 168424.08it/s]
78%|███████▊  | 4655297/6000000 [00:27<00:07, 169725.57it/s]
78%|███████▊  | 4672271/6000000 [00:27<00:07, 169727.15it/s]
78%|███████▊  | 4689273/6000000 [00:27<00:07, 169811.90it/s]
78%|███████▊  | 4706570/6000000 [00:28<00:07, 170756.43it/s]
79%|███████▊  | 4723647/6000000 [00:28<00:07, 169603.50it/s]
79%|███████▉  | 4740610/6000000 [00:28<00:07, 169384.53it/s]
79%|███████▉  | 4757753/6000000 [00:28<00:07, 169993.08it/s]
80%|███████▉  | 4774844/6000000 [00:28<00:07, 170264.03it/s]
80%|███████▉  | 4791872/6000000 [00:28<00:07, 169933.43it/s]
80%|████████  | 4809001/6000000 [00:28<00:07, 169639.29it/s]
80%|████████  | 4826209/6000000 [00:28<00:06, 170365.97it/s]
81%|████████  | 4843247/6000000 [00:28<00:06, 167143.96it/s]
81%|████████  | 4860049/6000000 [00:28<00:06, 167399.14it/s]
81%|████████▏ | 4877095/6000000 [00:29<00:06, 168304.52it/s]
82%|████████▏ | 4894075/6000000 [00:29<00:06, 168746.72it/s]
82%|████████▏ | 4911001/6000000 [00:29<00:06, 168289.27it/s]
82%|████████▏ | 4928128/6000000 [00:29<00:06, 169167.73it/s]
82%|████████▏ | 4945049/6000000 [00:29<00:06, 168978.63it/s]
83%|████████▎ | 4962001/6000000 [00:29<00:06, 168702.49it/s]
83%|████████▎ | 4979126/6000000 [00:29<00:06, 169461.82it/s]
83%|████████▎ | 4996074/6000000 [00:29<00:05, 169277.71it/s]
84%|████████▎ | 5013224/6000000 [00:29<00:05, 169941.44it/s]
84%|████████▍ | 5030300/6000000 [00:29<00:05, 170185.92it/s]
84%|████████▍ | 5047320/6000000 [00:30<00:05, 169387.05it/s]
84%|████████▍ | 5064260/6000000 [00:30<00:05, 169210.09it/s]
85%|████████▍ | 5081351/6000000 [00:30<00:05, 169715.88it/s]
85%|████████▍ | 5098324/6000000 [00:30<00:05, 169118.64it/s]
85%|████████▌ | 5115237/6000000 [00:30<00:05, 168598.35it/s]
86%|████████▌ | 5132348/6000000 [00:30<00:05, 169345.19it/s]
86%|████████▌ | 5149284/6000000 [00:30<00:05, 169018.92it/s]
86%|████████▌ | 5166252/6000000 [00:30<00:04, 169215.79it/s]
86%|████████▋ | 5183211/6000000 [00:30<00:04, 169326.22it/s]
87%|████████▋ | 5200262/6000000 [00:30<00:04, 169679.02it/s]
87%|████████▋ | 5217231/6000000 [00:31<00:04, 168720.42it/s]
87%|████████▋ | 5234770/6000000 [00:31<00:04, 170710.14it/s]
88%|████████▊ | 5251843/6000000 [00:31<00:04, 169506.35it/s]
88%|████████▊ | 5268797/6000000 [00:31<00:04, 169333.20it/s]
88%|████████▊ | 5286001/6000000 [00:31<00:04, 169659.64it/s]
88%|████████▊ | 5302997/6000000 [00:31<00:04, 169745.74it/s]
89%|████████▊ | 5319973/6000000 [00:31<00:04, 168709.08it/s]
89%|████████▉ | 5337001/6000000 [00:31<00:03, 168936.58it/s]
89%|████████▉ | 5353896/6000000 [00:31<00:03, 168874.84it/s]
90%|████████▉ | 5370785/6000000 [00:31<00:03, 168505.22it/s]
90%|████████▉ | 5387637/6000000 [00:32<00:03, 168413.53it/s]
90%|█████████ | 5404479/6000000 [00:32<00:03, 167671.15it/s]
90%|█████████ | 5421277/6000000 [00:32<00:03, 167761.18it/s]
91%|█████████ | 5438281/6000000 [00:32<00:03, 168439.37it/s]
91%|█████████ | 5455126/6000000 [00:32<00:03, 168151.04it/s]
91%|█████████ | 5472001/6000000 [00:32<00:03, 167912.03it/s]
91%|█████████▏| 5489230/6000000 [00:32<00:03, 169218.33it/s]
92%|█████████▏| 5506340/6000000 [00:32<00:02, 169779.00it/s]
92%|█████████▏| 5523319/6000000 [00:32<00:02, 169508.87it/s]
92%|█████████▏| 5540418/6000000 [00:32<00:02, 169949.00it/s]
93%|█████████▎| 5557414/6000000 [00:33<00:02, 169358.86it/s]
93%|█████████▎| 5574351/6000000 [00:33<00:02, 168892.37it/s]
93%|█████████▎| 5591467/6000000 [00:33<00:02, 169567.46it/s]
93%|█████████▎| 5608425/6000000 [00:33<00:02, 169138.21it/s]
94%|█████████▍| 5625340/6000000 [00:33<00:02, 168738.84it/s]
94%|█████████▍| 5642403/6000000 [00:33<00:02, 169301.53it/s]
94%|█████████▍| 5659334/6000000 [00:33<00:02, 168051.85it/s]
95%|█████████▍| 5676142/6000000 [00:33<00:01, 166949.43it/s]
95%|█████████▍| 5693519/6000000 [00:33<00:01, 168974.39it/s]
95%|█████████▌| 5710421/6000000 [00:33<00:01, 168533.71it/s]
95%|█████████▌| 5727277/6000000 [00:34<00:01, 168035.29it/s]
96%|█████████▌| 5744280/6000000 [00:34<00:01, 168628.84it/s]
96%|█████████▌| 5761145/6000000 [00:34<00:01, 168313.51it/s]
96%|█████████▋| 5778045/6000000 [00:34<00:01, 168515.30it/s]
97%|█████████▋| 5795343/6000000 [00:34<00:01, 169847.20it/s]
97%|█████████▋| 5812329/6000000 [00:34<00:01, 169131.52it/s]
97%|█████████▋| 5829244/6000000 [00:34<00:01, 168620.62it/s]
97%|█████████▋| 5846651/6000000 [00:34<00:00, 170246.00it/s]
98%|█████████▊| 5863678/6000000 [00:34<00:00, 169470.34it/s]
98%|█████████▊| 5880627/6000000 [00:35<00:00, 168392.14it/s]
98%|█████████▊| 5897812/6000000 [00:35<00:00, 169419.56it/s]
99%|█████████▊| 5914757/6000000 [00:35<00:00, 168877.72it/s]
99%|█████████▉| 5931647/6000000 [00:35<00:00, 168294.34it/s]
99%|█████████▉| 5948824/6000000 [00:35<00:00, 169327.63it/s]
99%|█████████▉| 5965759/6000000 [00:35<00:00, 167832.36it/s]
100%|█████████▉| 5982546/6000000 [00:35<00:00, 167831.70it/s]
100%|█████████▉| 5999766/6000000 [00:35<00:00, 169132.44it/s]
100%|██████████| 6000000/6000000 [00:35<00:00, 168001.05it/s]

episode step policy_idx arm reward reward_stat
1324896 441 632 0 0 0.394166 1.123978
3566176 1188 725 1 0 1.663107 0.665520
1109043 369 681 0 4 -1.580458 1.168326
4286042 1428 680 2 0 0.703739 1.782960
5395174 1798 391 1 0 -0.390197 0.734739

It is then straightforward to plot the average reward each policy obtains at each step, by averaging over episodes.

policy_names = {
    0: 'ε = 0.1',
    1: 'ε = 0.01',
    2: 'ε = 0 (greedy)'
}

(
    trace_df
    .assign(policy=trace_df.policy_idx.map(policy_names))
    .groupby(['step', 'policy'])
    ['reward'].mean()
    .unstack()
    .plot()
)
<Axes: xlabel='step'>

png

Controlling the evaluation loop

The bandit.evaluate function is useful for benchmarking. But in practice, you'll want to have control over your bandit policy. Indeed you'll want the freedom to pull arms (with the pull method) and update the policy (with the update method) at your discretion.

As an example, the following is a possible reimplementation of the bandit.evaluate function. Here we'll be measuring the rate at which each policy selects the optimal arm.

Note how the pull and update methods are used.

import copy

policies=[
    bandit.EpsilonGreedy(epsilon=0.1),
    bandit.EpsilonGreedy(epsilon=0.01),
    bandit.EpsilonGreedy(epsilon=0),
]

env = gym.make(
    'river_bandits/KArmedTestbed-v0',
    max_episode_steps=1000
)
n_episodes = 2000

trace = []

with tqdm(total=len(policies) * n_episodes * env._max_episode_steps, position=0) as progress:
    for policy in policies:
        for episode in range(n_episodes):
            episode_policy = policy.clone()
            episode_env = copy.deepcopy(env)
            episode_env.reset()
            step = 0
            while True:
                action = episode_policy.pull(range(episode_env.action_space.n))
                observation, reward, terminated, truncated, info = episode_env.step(action)
                best_action = observation
                episode_policy.update(action, reward)

                trace.append({
                    "episode": episode,
                    "step": step,
                    "policy": f"ε = {policy.epsilon}",
                    "is_action_optimal": action == best_action
                })
                step += 1
                progress.update()

                if terminated or truncated:
                    break

trace_df = pd.DataFrame(trace)
0%|          | 0/6000000 [00:00<?, ?it/s]
0%|          | 16720/6000000 [00:00<00:35, 167192.98it/s]
1%|          | 34491/6000000 [00:00<00:34, 173374.59it/s]
1%|          | 52001/6000000 [00:00<00:34, 174090.30it/s]
1%|          | 69804/6000000 [00:00<00:33, 175643.98it/s]
1%|▏         | 87463/6000000 [00:00<00:33, 175982.70it/s]
2%|▏         | 105217/6000000 [00:00<00:33, 176510.03it/s]
2%|▏         | 122979/6000000 [00:00<00:33, 176870.72it/s]
2%|▏         | 140737/6000000 [00:00<00:33, 177094.01it/s]
3%|▎         | 158447/6000000 [00:00<00:33, 176203.20it/s]
3%|▎         | 176202/6000000 [00:01<00:32, 176616.28it/s]
3%|▎         | 194173/6000000 [00:01<00:32, 177559.94it/s]
4%|▎         | 212001/6000000 [00:01<00:32, 177657.36it/s]
4%|▍         | 229817/6000000 [00:01<00:32, 177808.40it/s]
4%|▍         | 247599/6000000 [00:01<00:32, 177348.63it/s]
4%|▍         | 265335/6000000 [00:01<00:32, 177186.23it/s]
5%|▍         | 283178/6000000 [00:01<00:32, 177559.21it/s]
5%|▌         | 301017/6000000 [00:01<00:32, 177807.37it/s]
5%|▌         | 318799/6000000 [00:01<00:31, 177550.39it/s]
6%|▌         | 336581/6000000 [00:01<00:31, 177629.36it/s]
6%|▌         | 354433/6000000 [00:02<00:31, 177894.68it/s]
6%|▌         | 372223/6000000 [00:02<00:31, 177738.24it/s]
7%|▋         | 390007/6000000 [00:02<00:31, 177767.27it/s]
7%|▋         | 407889/6000000 [00:02<00:31, 178080.65it/s]
7%|▋         | 425717/6000000 [00:02<00:31, 178138.19it/s]
7%|▋         | 443531/6000000 [00:02<00:31, 177879.95it/s]
8%|▊         | 461480/6000000 [00:02<00:31, 178361.65it/s]
8%|▊         | 479347/6000000 [00:02<00:30, 178452.60it/s]
8%|▊         | 497193/6000000 [00:02<00:30, 178414.31it/s]
9%|▊         | 515108/6000000 [00:02<00:30, 178634.17it/s]
9%|▉         | 533045/6000000 [00:03<00:30, 178853.22it/s]
9%|▉         | 551002/6000000 [00:03<00:30, 179033.68it/s]
9%|▉         | 568992/6000000 [00:03<00:30, 179291.23it/s]
10%|▉         | 586922/6000000 [00:03<00:30, 179206.64it/s]
10%|█         | 604843/6000000 [00:03<00:30, 178160.25it/s]
10%|█         | 622778/6000000 [00:03<00:30, 178514.87it/s]
11%|█         | 640848/6000000 [00:03<00:29, 179167.65it/s]
11%|█         | 658766/6000000 [00:03<00:29, 179086.15it/s]
11%|█▏        | 676676/6000000 [00:03<00:29, 179052.00it/s]
12%|█▏        | 694582/6000000 [00:03<00:29, 178784.13it/s]
12%|█▏        | 712461/6000000 [00:04<00:29, 178450.06it/s]
12%|█▏        | 730307/6000000 [00:04<00:29, 178190.75it/s]
12%|█▏        | 748135/6000000 [00:04<00:29, 178215.84it/s]
13%|█▎        | 766015/6000000 [00:04<00:29, 178388.99it/s]
13%|█▎        | 783855/6000000 [00:04<00:29, 177625.97it/s]
13%|█▎        | 801619/6000000 [00:04<00:29, 176420.08it/s]
14%|█▎        | 819263/6000000 [00:04<00:29, 175543.98it/s]
14%|█▍        | 836820/6000000 [00:04<00:29, 174525.07it/s]
14%|█▍        | 854275/6000000 [00:04<00:29, 173749.76it/s]
15%|█▍        | 871652/6000000 [00:04<00:29, 173400.73it/s]
15%|█▍        | 888993/6000000 [00:05<00:29, 173049.79it/s]
15%|█▌        | 906418/6000000 [00:05<00:29, 173406.46it/s]
15%|█▌        | 923760/6000000 [00:05<00:29, 172673.67it/s]
16%|█▌        | 941083/6000000 [00:05<00:29, 172836.93it/s]
16%|█▌        | 958417/6000000 [00:05<00:29, 172984.73it/s]
16%|█▋        | 975716/6000000 [00:05<00:29, 172865.09it/s]
17%|█▋        | 993003/6000000 [00:05<00:29, 172639.79it/s]
17%|█▋        | 1010268/6000000 [00:05<00:28, 172566.96it/s]
17%|█▋        | 1027525/6000000 [00:05<00:28, 172310.93it/s]
17%|█▋        | 1044757/6000000 [00:05<00:28, 172157.85it/s]
18%|█▊        | 1061973/6000000 [00:06<00:28, 171874.05it/s]
18%|█▊        | 1079161/6000000 [00:06<00:28, 171494.77it/s]
18%|█▊        | 1096432/6000000 [00:06<00:28, 171855.14it/s]
19%|█▊        | 1113618/6000000 [00:06<00:28, 171721.76it/s]
19%|█▉        | 1130894/6000000 [00:06<00:28, 172031.08it/s]
19%|█▉        | 1148209/6000000 [00:06<00:28, 172363.80it/s]
19%|█▉        | 1165446/6000000 [00:06<00:28, 172132.66it/s]
20%|█▉        | 1182686/6000000 [00:06<00:27, 172212.16it/s]
20%|█▉        | 1199908/6000000 [00:06<00:27, 172168.44it/s]
20%|██        | 1217233/6000000 [00:06<00:27, 172491.75it/s]
21%|██        | 1234550/6000000 [00:07<00:27, 172694.75it/s]
21%|██        | 1251956/6000000 [00:07<00:27, 173103.05it/s]
21%|██        | 1269267/6000000 [00:07<00:27, 172726.42it/s]
21%|██▏       | 1286613/6000000 [00:07<00:27, 172943.68it/s]
22%|██▏       | 1303908/6000000 [00:07<00:27, 172715.39it/s]
22%|██▏       | 1321181/6000000 [00:07<00:27, 172718.09it/s]
22%|██▏       | 1338453/6000000 [00:07<00:27, 172530.76it/s]
23%|██▎       | 1355707/6000000 [00:07<00:26, 172510.05it/s]
23%|██▎       | 1373001/6000000 [00:07<00:26, 172487.24it/s]
23%|██▎       | 1390418/6000000 [00:07<00:26, 172991.24it/s]
23%|██▎       | 1407826/6000000 [00:08<00:26, 173316.96it/s]
24%|██▍       | 1425182/6000000 [00:08<00:26, 173389.49it/s]
24%|██▍       | 1442522/6000000 [00:08<00:26, 173164.05it/s]
24%|██▍       | 1459839/6000000 [00:08<00:26, 172841.41it/s]
25%|██▍       | 1477124/6000000 [00:08<00:26, 172268.78it/s]
25%|██▍       | 1494352/6000000 [00:08<00:26, 172048.59it/s]
25%|██▌       | 1511558/6000000 [00:08<00:26, 171698.42it/s]
25%|██▌       | 1528729/6000000 [00:08<00:26, 171687.86it/s]
26%|██▌       | 1545898/6000000 [00:08<00:25, 171637.50it/s]
26%|██▌       | 1563065/6000000 [00:08<00:25, 171646.55it/s]
26%|██▋       | 1580230/6000000 [00:09<00:25, 171316.68it/s]
27%|██▋       | 1597362/6000000 [00:09<00:25, 170791.64it/s]
27%|██▋       | 1614498/6000000 [00:09<00:25, 170958.62it/s]
27%|██▋       | 1631661/6000000 [00:09<00:25, 171157.46it/s]
27%|██▋       | 1648778/6000000 [00:09<00:25, 170705.97it/s]
28%|██▊       | 1666001/6000000 [00:09<00:25, 171035.80it/s]
28%|██▊       | 1683172/6000000 [00:09<00:25, 171234.01it/s]
28%|██▊       | 1700296/6000000 [00:09<00:25, 171177.20it/s]
29%|██▊       | 1717414/6000000 [00:09<00:25, 171002.24it/s]
29%|██▉       | 1734620/6000000 [00:09<00:24, 171317.93it/s]
29%|██▉       | 1751778/6000000 [00:10<00:24, 171396.13it/s]
29%|██▉       | 1769115/6000000 [00:10<00:24, 171985.35it/s]
30%|██▉       | 1786314/6000000 [00:10<00:24, 171829.82it/s]
30%|███       | 1803498/6000000 [00:10<00:24, 171735.25it/s]
30%|███       | 1820672/6000000 [00:10<00:24, 171563.41it/s]
31%|███       | 1837933/6000000 [00:10<00:24, 171875.24it/s]
31%|███       | 1855121/6000000 [00:10<00:24, 171600.97it/s]
31%|███       | 1872282/6000000 [00:10<00:24, 171336.82it/s]
31%|███▏      | 1889416/6000000 [00:10<00:23, 171292.11it/s]
32%|███▏      | 1906546/6000000 [00:10<00:23, 171074.80it/s]
32%|███▏      | 1923739/6000000 [00:11<00:23, 171328.16it/s]
32%|███▏      | 1941000/6000000 [00:11<00:23, 171710.69it/s]
33%|███▎      | 1958345/6000000 [00:11<00:23, 172230.73it/s]
33%|███▎      | 1975782/6000000 [00:11<00:23, 172871.08it/s]
33%|███▎      | 1993070/6000000 [00:11<00:23, 172137.93it/s]
34%|███▎      | 2010285/6000000 [00:11<00:23, 170541.11it/s]
34%|███▍      | 2027343/6000000 [00:11<00:23, 169728.64it/s]
34%|███▍      | 2044319/6000000 [00:11<00:23, 167980.56it/s]
34%|███▍      | 2061122/6000000 [00:11<00:23, 166729.42it/s]
35%|███▍      | 2077799/6000000 [00:11<00:23, 166083.10it/s]
35%|███▍      | 2094410/6000000 [00:12<00:23, 165430.91it/s]
35%|███▌      | 2110955/6000000 [00:12<00:23, 165082.73it/s]
35%|███▌      | 2127464/6000000 [00:12<00:23, 164710.79it/s]
36%|███▌      | 2143936/6000000 [00:12<00:23, 164323.70it/s]
36%|███▌      | 2160369/6000000 [00:12<00:23, 163786.12it/s]
36%|███▋      | 2176748/6000000 [00:12<00:23, 163652.12it/s]
37%|███▋      | 2193359/6000000 [00:12<00:23, 164385.01it/s]
37%|███▋      | 2209872/6000000 [00:12<00:23, 164606.68it/s]
37%|███▋      | 2226355/6000000 [00:12<00:22, 164671.06it/s]
37%|███▋      | 2242823/6000000 [00:12<00:23, 162989.79it/s]
38%|███▊      | 2259170/6000000 [00:13<00:22, 163132.19it/s]
38%|███▊      | 2275658/6000000 [00:13<00:22, 163652.01it/s]
38%|███▊      | 2292026/6000000 [00:13<00:22, 163282.59it/s]
38%|███▊      | 2308424/6000000 [00:13<00:22, 163489.03it/s]
39%|███▊      | 2324775/6000000 [00:13<00:22, 163112.29it/s]
39%|███▉      | 2341097/6000000 [00:13<00:22, 163143.36it/s]
39%|███▉      | 2357413/6000000 [00:13<00:22, 162938.95it/s]
40%|███▉      | 2373739/6000000 [00:13<00:22, 163033.41it/s]
40%|███▉      | 2390107/6000000 [00:13<00:22, 163225.89it/s]
40%|████      | 2406686/6000000 [00:13<00:21, 163993.63it/s]
40%|████      | 2423086/6000000 [00:14<00:21, 163699.76it/s]
41%|████      | 2439479/6000000 [00:14<00:21, 163768.39it/s]
41%|████      | 2455894/6000000 [00:14<00:21, 163880.07it/s]
41%|████      | 2472283/6000000 [00:14<00:21, 163856.87it/s]
41%|████▏     | 2488669/6000000 [00:14<00:21, 163432.57it/s]
42%|████▏     | 2505013/6000000 [00:14<00:21, 163424.70it/s]
42%|████▏     | 2521389/6000000 [00:14<00:21, 163523.06it/s]
42%|████▏     | 2537742/6000000 [00:14<00:21, 163334.99it/s]
43%|████▎     | 2554076/6000000 [00:14<00:21, 163199.26it/s]
43%|████▎     | 2570397/6000000 [00:14<00:21, 163097.60it/s]
43%|████▎     | 2586755/6000000 [00:15<00:20, 163240.61it/s]
43%|████▎     | 2603080/6000000 [00:15<00:20, 161991.05it/s]
44%|████▎     | 2619433/6000000 [00:15<00:20, 162449.34it/s]
44%|████▍     | 2635802/6000000 [00:15<00:20, 162819.73it/s]
44%|████▍     | 2652086/6000000 [00:15<00:20, 162629.39it/s]
44%|████▍     | 2668580/6000000 [00:15<00:20, 163319.74it/s]
45%|████▍     | 2685115/6000000 [00:15<00:20, 163927.36it/s]
45%|████▌     | 2701672/6000000 [00:15<00:20, 164418.77it/s]
45%|████▌     | 2718115/6000000 [00:15<00:20, 163045.05it/s]
46%|████▌     | 2734519/6000000 [00:15<00:19, 163341.15it/s]
46%|████▌     | 2750999/6000000 [00:16<00:19, 163776.01it/s]
46%|████▌     | 2767592/6000000 [00:16<00:19, 164418.51it/s]
46%|████▋     | 2784142/6000000 [00:16<00:19, 164741.56it/s]
47%|████▋     | 2800618/6000000 [00:16<00:19, 164703.96it/s]
47%|████▋     | 2817090/6000000 [00:16<00:19, 163847.52it/s]
47%|████▋     | 2833477/6000000 [00:16<00:19, 163848.21it/s]
48%|████▊     | 2850017/6000000 [00:16<00:19, 164311.35it/s]
48%|████▊     | 2866452/6000000 [00:16<00:19, 164321.33it/s]
48%|████▊     | 2882885/6000000 [00:16<00:19, 163760.42it/s]
48%|████▊     | 2899262/6000000 [00:16<00:18, 163472.40it/s]
49%|████▊     | 2915610/6000000 [00:17<00:18, 162929.03it/s]
49%|████▉     | 2931928/6000000 [00:17<00:18, 163003.88it/s]
49%|████▉     | 2948229/6000000 [00:17<00:18, 162878.98it/s]
49%|████▉     | 2964518/6000000 [00:17<00:18, 162587.58it/s]
50%|████▉     | 2980849/6000000 [00:17<00:18, 162802.01it/s]
50%|████▉     | 2997130/6000000 [00:17<00:18, 162615.65it/s]
50%|█████     | 3013436/6000000 [00:17<00:18, 162748.33it/s]
50%|█████     | 3029798/6000000 [00:17<00:18, 163006.98it/s]
51%|█████     | 3046192/6000000 [00:17<00:18, 163285.54it/s]
51%|█████     | 3062521/6000000 [00:17<00:18, 163121.86it/s]
51%|█████▏    | 3078890/6000000 [00:18<00:17, 163290.84it/s]
52%|█████▏    | 3095220/6000000 [00:18<00:17, 163274.61it/s]
52%|█████▏    | 3111593/6000000 [00:18<00:17, 163409.66it/s]
52%|█████▏    | 3127935/6000000 [00:18<00:17, 163278.81it/s]
52%|█████▏    | 3144288/6000000 [00:18<00:17, 163352.39it/s]
53%|█████▎    | 3160692/6000000 [00:18<00:17, 163557.30it/s]
53%|█████▎    | 3177049/6000000 [00:18<00:17, 163559.52it/s]
53%|█████▎    | 3193405/6000000 [00:18<00:17, 163241.64it/s]
53%|█████▎    | 3209730/6000000 [00:18<00:17, 162584.95it/s]
54%|█████▍    | 3226212/6000000 [00:18<00:16, 163250.31it/s]
54%|█████▍    | 3242550/6000000 [00:19<00:16, 163288.81it/s]
54%|█████▍    | 3258906/6000000 [00:19<00:16, 163367.95it/s]
55%|█████▍    | 3275244/6000000 [00:19<00:16, 162959.68it/s]
55%|█████▍    | 3291541/6000000 [00:19<00:16, 162853.85it/s]
55%|█████▌    | 3307919/6000000 [00:19<00:16, 163128.52it/s]
55%|█████▌    | 3324400/6000000 [00:19<00:16, 163630.65it/s]
56%|█████▌    | 3340764/6000000 [00:19<00:16, 163425.33it/s]
56%|█████▌    | 3357107/6000000 [00:19<00:16, 162755.30it/s]
56%|█████▌    | 3373445/6000000 [00:19<00:16, 162940.37it/s]
56%|█████▋    | 3389821/6000000 [00:19<00:15, 163183.15it/s]
57%|█████▋    | 3406183/6000000 [00:20<00:15, 163313.64it/s]
57%|█████▋    | 3422527/6000000 [00:20<00:15, 163351.19it/s]
57%|█████▋    | 3438863/6000000 [00:20<00:15, 163026.77it/s]
58%|█████▊    | 3455166/6000000 [00:20<00:15, 162763.49it/s]
58%|█████▊    | 3471443/6000000 [00:20<00:15, 162344.38it/s]
58%|█████▊    | 3487791/6000000 [00:20<00:15, 162682.14it/s]
58%|█████▊    | 3504093/6000000 [00:20<00:15, 162781.77it/s]
59%|█████▊    | 3520702/6000000 [00:20<00:15, 163772.04it/s]
59%|█████▉    | 3537204/6000000 [00:20<00:15, 164143.42it/s]
59%|█████▉    | 3553752/6000000 [00:20<00:14, 164543.14it/s]
60%|█████▉    | 3570207/6000000 [00:21<00:14, 163710.77it/s]
60%|█████▉    | 3586580/6000000 [00:21<00:14, 163664.79it/s]
60%|██████    | 3602948/6000000 [00:21<00:14, 163408.14it/s]
60%|██████    | 3619395/6000000 [00:21<00:14, 163725.01it/s]
61%|██████    | 3635768/6000000 [00:21<00:14, 163588.56it/s]
61%|██████    | 3652128/6000000 [00:21<00:14, 163349.54it/s]
61%|██████    | 3668464/6000000 [00:21<00:14, 163226.84it/s]
61%|██████▏   | 3684831/6000000 [00:21<00:14, 163359.06it/s]
62%|██████▏   | 3701168/6000000 [00:21<00:14, 163257.46it/s]
62%|██████▏   | 3717494/6000000 [00:21<00:13, 163098.82it/s]
62%|██████▏   | 3733804/6000000 [00:22<00:13, 163047.65it/s]
63%|██████▎   | 3750123/6000000 [00:22<00:13, 163089.06it/s]
63%|██████▎   | 3766432/6000000 [00:22<00:13, 163070.74it/s]
63%|██████▎   | 3782809/6000000 [00:22<00:13, 163279.58it/s]
63%|██████▎   | 3799137/6000000 [00:22<00:13, 163193.89it/s]
64%|██████▎   | 3815676/6000000 [00:22<00:13, 163850.39it/s]
64%|██████▍   | 3832062/6000000 [00:22<00:13, 163846.99it/s]
64%|██████▍   | 3848447/6000000 [00:22<00:13, 163810.70it/s]
64%|██████▍   | 3865001/6000000 [00:22<00:13, 164205.10it/s]
65%|██████▍   | 3881533/6000000 [00:22<00:12, 164537.81it/s]
65%|██████▍   | 3898042/6000000 [00:23<00:12, 164702.12it/s]
65%|██████▌   | 3914513/6000000 [00:23<00:12, 163059.58it/s]
66%|██████▌   | 3930953/6000000 [00:23<00:12, 163457.08it/s]
66%|██████▌   | 3947370/6000000 [00:23<00:12, 163664.43it/s]
66%|██████▌   | 3963793/6000000 [00:23<00:12, 163832.36it/s]
66%|██████▋   | 3980178/6000000 [00:23<00:12, 163238.27it/s]
67%|██████▋   | 3996631/6000000 [00:23<00:12, 163621.24it/s]
67%|██████▋   | 4013296/6000000 [00:23<00:12, 164526.43it/s]
67%|██████▋   | 4029970/6000000 [00:23<00:11, 165187.48it/s]
67%|██████▋   | 4046743/6000000 [00:23<00:11, 165946.52it/s]
68%|██████▊   | 4063493/6000000 [00:24<00:11, 166410.45it/s]
68%|██████▊   | 4080426/6000000 [00:24<00:11, 167279.85it/s]
68%|██████▊   | 4097155/6000000 [00:24<00:11, 167104.47it/s]
69%|██████▊   | 4113866/6000000 [00:24<00:11, 166776.24it/s]
69%|██████▉   | 4130544/6000000 [00:24<00:11, 166509.44it/s]
69%|██████▉   | 4147383/6000000 [00:24<00:11, 167071.56it/s]
69%|██████▉   | 4164241/6000000 [00:24<00:10, 167522.34it/s]
70%|██████▉   | 4181001/6000000 [00:24<00:10, 167538.24it/s]
70%|██████▉   | 4198001/6000000 [00:24<00:10, 168136.27it/s]
70%|███████   | 4214862/6000000 [00:24<00:10, 168277.18it/s]
71%|███████   | 4231690/6000000 [00:25<00:10, 167422.01it/s]
71%|███████   | 4248434/6000000 [00:25<00:10, 167383.56it/s]
71%|███████   | 4265230/6000000 [00:25<00:10, 167553.49it/s]
71%|███████▏  | 4282001/6000000 [00:25<00:10, 167413.88it/s]
72%|███████▏  | 4298743/6000000 [00:25<00:10, 166981.19it/s]
72%|███████▏  | 4315454/6000000 [00:25<00:10, 167016.89it/s]
72%|███████▏  | 4332156/6000000 [00:25<00:09, 166863.21it/s]
72%|███████▏  | 4348843/6000000 [00:25<00:09, 166646.84it/s]
73%|███████▎  | 4365529/6000000 [00:25<00:09, 166708.65it/s]
73%|███████▎  | 4382298/6000000 [00:25<00:09, 166999.95it/s]
73%|███████▎  | 4399120/6000000 [00:26<00:09, 167364.47it/s]
74%|███████▎  | 4416187/6000000 [00:26<00:09, 168354.45it/s]
74%|███████▍  | 4433023/6000000 [00:26<00:09, 168096.02it/s]
74%|███████▍  | 4450001/6000000 [00:26<00:09, 168472.12it/s]
74%|███████▍  | 4466849/6000000 [00:26<00:09, 168227.65it/s]
75%|███████▍  | 4483672/6000000 [00:26<00:09, 167985.14it/s]
75%|███████▌  | 4500479/6000000 [00:26<00:08, 168007.22it/s]
75%|███████▌  | 4517280/6000000 [00:26<00:08, 167524.04it/s]
76%|███████▌  | 4534033/6000000 [00:26<00:08, 166905.33it/s]
76%|███████▌  | 4551001/6000000 [00:27<00:08, 167656.39it/s]
76%|███████▌  | 4568001/6000000 [00:27<00:08, 168117.42it/s]
76%|███████▋  | 4584814/6000000 [00:27<00:08, 167670.86it/s]
77%|███████▋  | 4601701/6000000 [00:27<00:08, 168026.91it/s]
77%|███████▋  | 4618505/6000000 [00:27<00:08, 167348.44it/s]
77%|███████▋  | 4635241/6000000 [00:27<00:08, 167037.52it/s]
78%|███████▊  | 4652139/6000000 [00:27<00:08, 167617.14it/s]
78%|███████▊  | 4668910/6000000 [00:27<00:07, 167644.11it/s]
78%|███████▊  | 4685675/6000000 [00:27<00:07, 167086.16it/s]
78%|███████▊  | 4702385/6000000 [00:27<00:07, 167081.53it/s]
79%|███████▊  | 4719094/6000000 [00:28<00:07, 167049.80it/s]
79%|███████▉  | 4735882/6000000 [00:28<00:07, 167295.63it/s]
79%|███████▉  | 4752612/6000000 [00:28<00:07, 167275.01it/s]
79%|███████▉  | 4769340/6000000 [00:28<00:07, 166914.63it/s]
80%|███████▉  | 4786127/6000000 [00:28<00:07, 167200.26it/s]
80%|████████  | 4802926/6000000 [00:28<00:07, 167434.10it/s]
80%|████████  | 4819670/6000000 [00:28<00:07, 167240.68it/s]
81%|████████  | 4836509/6000000 [00:28<00:06, 167583.53it/s]
81%|████████  | 4853268/6000000 [00:28<00:06, 166689.37it/s]
81%|████████  | 4870034/6000000 [00:28<00:06, 166978.69it/s]
81%|████████▏ | 4887001/6000000 [00:29<00:06, 167686.47it/s]
82%|████████▏ | 4903790/6000000 [00:29<00:06, 167744.91it/s]
82%|████████▏ | 4920609/6000000 [00:29<00:06, 167877.99it/s]
82%|████████▏ | 4937398/6000000 [00:29<00:06, 167392.47it/s]
83%|████████▎ | 4954138/6000000 [00:29<00:06, 167042.99it/s]
83%|████████▎ | 4971088/6000000 [00:29<00:06, 167776.58it/s]
83%|████████▎ | 4987941/6000000 [00:29<00:06, 168000.77it/s]
83%|████████▎ | 5004742/6000000 [00:29<00:05, 167714.27it/s]
84%|████████▎ | 5021651/6000000 [00:29<00:05, 168123.16it/s]
84%|████████▍ | 5038464/6000000 [00:29<00:05, 167928.49it/s]
84%|████████▍ | 5055408/6000000 [00:30<00:05, 168380.71it/s]
85%|████████▍ | 5072247/6000000 [00:30<00:05, 168063.64it/s]
85%|████████▍ | 5089054/6000000 [00:30<00:05, 167309.61it/s]
85%|████████▌ | 5105898/6000000 [00:30<00:05, 167645.48it/s]
85%|████████▌ | 5122990/6000000 [00:30<00:05, 168622.79it/s]
86%|████████▌ | 5139854/6000000 [00:30<00:05, 168097.69it/s]
86%|████████▌ | 5156665/6000000 [00:30<00:05, 167003.73it/s]
86%|████████▌ | 5173577/6000000 [00:30<00:04, 167632.61it/s]
87%|████████▋ | 5190639/6000000 [00:30<00:04, 168524.06it/s]
87%|████████▋ | 5207494/6000000 [00:30<00:04, 167934.63it/s]
87%|████████▋ | 5224371/6000000 [00:31<00:04, 168182.36it/s]
87%|████████▋ | 5241197/6000000 [00:31<00:04, 168203.22it/s]
88%|████████▊ | 5258224/6000000 [00:31<00:04, 168820.45it/s]
88%|████████▊ | 5275144/6000000 [00:31<00:04, 168932.48it/s]
88%|████████▊ | 5292038/6000000 [00:31<00:04, 168306.91it/s]
88%|████████▊ | 5309113/6000000 [00:31<00:04, 169034.94it/s]
89%|████████▉ | 5326018/6000000 [00:31<00:03, 169014.77it/s]
89%|████████▉ | 5342921/6000000 [00:31<00:03, 168759.13it/s]
89%|████████▉ | 5359798/6000000 [00:31<00:03, 168443.18it/s]
90%|████████▉ | 5376643/6000000 [00:31<00:03, 167815.18it/s]
90%|████████▉ | 5393566/6000000 [00:32<00:03, 168237.20it/s]
90%|█████████ | 5410505/6000000 [00:32<00:03, 168579.31it/s]
90%|█████████ | 5427364/6000000 [00:32<00:03, 168280.18it/s]
91%|█████████ | 5444193/6000000 [00:32<00:03, 167682.82it/s]
91%|█████████ | 5461010/6000000 [00:32<00:03, 167827.89it/s]
91%|█████████▏| 5477959/6000000 [00:32<00:03, 168324.15it/s]
92%|█████████▏| 5494792/6000000 [00:32<00:03, 168290.52it/s]
92%|█████████▏| 5511678/6000000 [00:32<00:02, 168458.95it/s]
92%|█████████▏| 5528525/6000000 [00:32<00:02, 168197.36it/s]
92%|█████████▏| 5545345/6000000 [00:32<00:02, 168127.15it/s]
93%|█████████▎| 5562465/6000000 [00:33<00:02, 169045.83it/s]
93%|█████████▎| 5579370/6000000 [00:33<00:02, 168850.87it/s]
93%|█████████▎| 5596256/6000000 [00:33<00:02, 168659.74it/s]
94%|█████████▎| 5613290/6000000 [00:33<00:02, 169161.92it/s]
94%|█████████▍| 5630207/6000000 [00:33<00:02, 168825.90it/s]
94%|█████████▍| 5647090/6000000 [00:33<00:02, 168412.09it/s]
94%|█████████▍| 5664040/6000000 [00:33<00:01, 168736.09it/s]
95%|█████████▍| 5680914/6000000 [00:33<00:01, 168459.27it/s]
95%|█████████▍| 5697802/6000000 [00:33<00:01, 168582.42it/s]
95%|█████████▌| 5714692/6000000 [00:33<00:01, 168677.43it/s]
96%|█████████▌| 5731560/6000000 [00:34<00:01, 168037.92it/s]
96%|█████████▌| 5748627/6000000 [00:34<00:01, 168822.56it/s]
96%|█████████▌| 5765533/6000000 [00:34<00:01, 168891.01it/s]
96%|█████████▋| 5782423/6000000 [00:34<00:01, 168825.29it/s]
97%|█████████▋| 5799306/6000000 [00:34<00:01, 168609.28it/s]
97%|█████████▋| 5816168/6000000 [00:34<00:01, 168452.01it/s]
97%|█████████▋| 5833014/6000000 [00:34<00:00, 168035.01it/s]
97%|█████████▋| 5849925/6000000 [00:34<00:00, 168354.25it/s]
98%|█████████▊| 5866762/6000000 [00:34<00:00, 168358.65it/s]
98%|█████████▊| 5883812/6000000 [00:34<00:00, 168998.05it/s]
98%|█████████▊| 5900713/6000000 [00:35<00:00, 168653.56it/s]
99%|█████████▊| 5917579/6000000 [00:35<00:00, 168570.72it/s]
99%|█████████▉| 5934437/6000000 [00:35<00:00, 167690.78it/s]
99%|█████████▉| 5951416/6000000 [00:35<00:00, 168315.29it/s]
99%|█████████▉| 5968397/6000000 [00:35<00:00, 168760.13it/s]
100%|█████████▉| 5985282/6000000 [00:35<00:00, 168786.71it/s]
100%|██████████| 6000000/6000000 [00:35<00:00, 168450.74it/s]

colors = {
    'ε = 0.1': 'tab:blue',
    'ε = 0.01': 'tab:red',
    'ε = 0': 'tab:green'
}

(
    trace_df
    .groupby(['step', 'policy'])
    ['is_action_optimal'].mean()
    .unstack()
    .plot()
)
<Axes: xlabel='step'>

png

Handling drift

The environment used above is a toy situation used for introducing bandits. It is stationary, meaning that the expected reward of each arm does not change over time.

In practice, arms are dynamic, and their performance can vary over time. A simple example of this is the Candy Cane Contest that was hosted on Kaggle in 2020. The expected reward of each arm diminishes each time it is pulled.

The way bandit policies in River deal with drift depends on the method. For the bandit.EpsilonGreedy policy, it makes sense to use a rolling average as the reward object. What this means is that the empirical reward the policy calculates for each arm is a rolling average, rather than a global one.

from river import proba, utils

policies=[
    bandit.EpsilonGreedy(
        epsilon=0.1,
        seed=42
    ),
    bandit.EpsilonGreedy(
        epsilon=0.3,
        reward_obj=utils.Rolling(stats.Mean(), window_size=50),
        seed=42
    ),
    bandit.ThompsonSampling(
        reward_obj=proba.Beta(),
        seed=42
    )
]

env = gym.make('river_bandits/CandyCaneContest-v0')

trace = bandit.evaluate(
    policies=policies,
    env=env,
    n_episodes=(n_episodes := 30),
    seed=42
)

trace_df = pd.DataFrame(tqdm(
    trace, position=0, total=(
        n_episodes *
        len(policies) *
        env._max_episode_steps
    )
))
0%|          | 0/180000 [00:00<?, ?it/s]
1%|          | 1263/180000 [00:00<00:14, 12629.38it/s]
1%|▏         | 2574/180000 [00:00<00:13, 12905.75it/s]
2%|▏         | 3882/180000 [00:00<00:13, 12976.88it/s]
3%|▎         | 5181/180000 [00:00<00:13, 12981.99it/s]
4%|▎         | 6480/180000 [00:00<00:13, 12636.45it/s]
4%|▍         | 7746/180000 [00:00<00:13, 12618.72it/s]
5%|▌         | 9054/180000 [00:00<00:13, 12762.76it/s]
6%|▌         | 10354/180000 [00:00<00:13, 12836.28it/s]
6%|▋         | 11655/180000 [00:00<00:13, 12887.08it/s]
7%|▋         | 12945/180000 [00:01<00:13, 12739.14it/s]
8%|▊         | 14220/180000 [00:01<00:13, 12549.15it/s]
9%|▊         | 15476/180000 [00:01<00:13, 12411.40it/s]
9%|▉         | 16718/180000 [00:01<00:13, 12351.01it/s]
10%|▉         | 17954/180000 [00:01<00:13, 12318.92it/s]
11%|█         | 19221/180000 [00:01<00:12, 12420.60it/s]
11%|█▏        | 20526/180000 [00:01<00:12, 12607.55it/s]
12%|█▏        | 21834/180000 [00:01<00:12, 12745.60it/s]
13%|█▎        | 23145/180000 [00:01<00:12, 12854.07it/s]
14%|█▎        | 24431/180000 [00:01<00:12, 12741.50it/s]
14%|█▍        | 25706/180000 [00:02<00:12, 12601.49it/s]
15%|█▍        | 26967/180000 [00:02<00:12, 12475.73it/s]
16%|█▌        | 28254/180000 [00:02<00:12, 12589.84it/s]
16%|█▋        | 29541/180000 [00:02<00:11, 12672.25it/s]
17%|█▋        | 30809/180000 [00:02<00:11, 12592.08it/s]
18%|█▊        | 32130/180000 [00:02<00:11, 12771.23it/s]
19%|█▊        | 33429/180000 [00:02<00:11, 12832.96it/s]
19%|█▉        | 34731/180000 [00:02<00:11, 12886.45it/s]
20%|██        | 36020/180000 [00:02<00:11, 12578.84it/s]
21%|██        | 37413/180000 [00:02<00:10, 12975.27it/s]
22%|██▏       | 38735/180000 [00:03<00:10, 13047.08it/s]
22%|██▏       | 40052/180000 [00:03<00:10, 13083.05it/s]
23%|██▎       | 41364/180000 [00:03<00:10, 13092.77it/s]
24%|██▎       | 42675/180000 [00:03<00:10, 12754.10it/s]
24%|██▍       | 43968/180000 [00:03<00:10, 12799.56it/s]
25%|██▌       | 45272/180000 [00:03<00:10, 12869.50it/s]
26%|██▌       | 46563/180000 [00:03<00:10, 12880.72it/s]
27%|██▋       | 47853/180000 [00:03<00:10, 12676.84it/s]
27%|██▋       | 49122/180000 [00:03<00:10, 12410.48it/s]
28%|██▊       | 50365/180000 [00:03<00:10, 12335.25it/s]
29%|██▊       | 51600/180000 [00:04<00:10, 12281.09it/s]
29%|██▉       | 52829/180000 [00:04<00:10, 12258.04it/s]
30%|███       | 54056/180000 [00:04<00:10, 11643.28it/s]
31%|███       | 55425/180000 [00:04<00:10, 12224.92it/s]
31%|███▏      | 56655/180000 [00:04<00:10, 12187.88it/s]
32%|███▏      | 57882/180000 [00:04<00:10, 12210.72it/s]
33%|███▎      | 59178/180000 [00:04<00:09, 12428.12it/s]
34%|███▎      | 60424/180000 [00:04<00:09, 12415.58it/s]
34%|███▍      | 61734/180000 [00:04<00:09, 12613.28it/s]
35%|███▌      | 63039/180000 [00:04<00:09, 12742.00it/s]
36%|███▌      | 64326/180000 [00:05<00:09, 12776.81it/s]
36%|███▋      | 65605/180000 [00:05<00:09, 12635.23it/s]
37%|███▋      | 66870/180000 [00:05<00:09, 12518.54it/s]
38%|███▊      | 68123/180000 [00:05<00:09, 12351.64it/s]
39%|███▊      | 69417/180000 [00:05<00:08, 12521.54it/s]
39%|███▉      | 70671/180000 [00:05<00:08, 12425.94it/s]
40%|███▉      | 71968/180000 [00:05<00:08, 12585.61it/s]
41%|████      | 73230/180000 [00:05<00:08, 12592.85it/s]
41%|████▏     | 74536/180000 [00:05<00:08, 12730.82it/s]
42%|████▏     | 75831/180000 [00:06<00:08, 12795.70it/s]
43%|████▎     | 77134/180000 [00:06<00:07, 12865.17it/s]
44%|████▎     | 78421/180000 [00:06<00:07, 12745.69it/s]
44%|████▍     | 79729/180000 [00:06<00:07, 12844.10it/s]
45%|████▌     | 81027/180000 [00:06<00:07, 12880.50it/s]
46%|████▌     | 82323/180000 [00:06<00:07, 12901.07it/s]
46%|████▋     | 83618/180000 [00:06<00:07, 12914.85it/s]
47%|████▋     | 84910/180000 [00:06<00:07, 12840.38it/s]
48%|████▊     | 86210/180000 [00:06<00:07, 12887.88it/s]
49%|████▊     | 87499/180000 [00:06<00:07, 12880.84it/s]
49%|████▉     | 88788/180000 [00:07<00:07, 12621.89it/s]
50%|█████     | 90052/180000 [00:07<00:07, 12299.41it/s]
51%|█████     | 91413/180000 [00:07<00:06, 12680.75it/s]
51%|█████▏    | 92684/180000 [00:07<00:06, 12576.71it/s]
52%|█████▏    | 93987/180000 [00:07<00:06, 12708.17it/s]
53%|█████▎    | 95292/180000 [00:07<00:06, 12803.43it/s]
54%|█████▎    | 96574/180000 [00:07<00:06, 12733.26it/s]
54%|█████▍    | 97882/180000 [00:07<00:06, 12835.03it/s]
55%|█████▌    | 99167/180000 [00:07<00:06, 12676.16it/s]
56%|█████▌    | 100436/180000 [00:07<00:06, 12525.36it/s]
57%|█████▋    | 101724/180000 [00:08<00:06, 12627.92it/s]
57%|█████▋    | 102988/180000 [00:08<00:06, 12451.28it/s]
58%|█████▊    | 104235/180000 [00:08<00:06, 12337.56it/s]
59%|█████▊    | 105533/180000 [00:08<00:05, 12525.62it/s]
59%|█████▉    | 106830/180000 [00:08<00:05, 12655.88it/s]
60%|██████    | 108097/180000 [00:08<00:05, 12373.68it/s]
61%|██████    | 109434/180000 [00:08<00:05, 12661.79it/s]
62%|██████▏   | 110742/180000 [00:08<00:05, 12779.24it/s]
62%|██████▏   | 112053/180000 [00:08<00:05, 12873.80it/s]
63%|██████▎   | 113367/180000 [00:08<00:05, 12947.76it/s]
64%|██████▎   | 114663/180000 [00:09<00:05, 12830.82it/s]
64%|██████▍   | 115985/180000 [00:09<00:04, 12946.10it/s]
65%|██████▌   | 117281/180000 [00:09<00:04, 12766.94it/s]
66%|██████▌   | 118590/180000 [00:09<00:04, 12860.39it/s]
67%|██████▋   | 119901/180000 [00:09<00:04, 12930.81it/s]
67%|██████▋   | 121195/180000 [00:09<00:04, 12871.28it/s]
68%|██████▊   | 122505/180000 [00:09<00:04, 12936.43it/s]
69%|██████▉   | 123800/180000 [00:09<00:04, 12754.25it/s]
69%|██████▉   | 125077/180000 [00:09<00:04, 12621.78it/s]
70%|███████   | 126340/180000 [00:09<00:04, 12441.73it/s]
71%|███████   | 127681/180000 [00:10<00:04, 12724.69it/s]
72%|███████▏  | 128955/180000 [00:10<00:04, 12578.56it/s]
72%|███████▏  | 130259/180000 [00:10<00:03, 12713.59it/s]
73%|███████▎  | 131532/180000 [00:10<00:03, 12431.64it/s]
74%|███████▍  | 132795/180000 [00:10<00:03, 12484.02it/s]
74%|███████▍  | 134067/180000 [00:10<00:03, 12550.34it/s]
75%|███████▌  | 135360/180000 [00:10<00:03, 12659.45it/s]
76%|███████▌  | 136627/180000 [00:10<00:03, 12552.63it/s]
77%|███████▋  | 137926/180000 [00:10<00:03, 12681.73it/s]
77%|███████▋  | 139195/180000 [00:10<00:03, 12659.81it/s]
78%|███████▊  | 140508/180000 [00:11<00:03, 12796.99it/s]
79%|███████▉  | 141789/180000 [00:11<00:03, 12691.92it/s]
79%|███████▉  | 143059/180000 [00:11<00:02, 12555.76it/s]
80%|████████  | 144316/180000 [00:11<00:02, 12484.29it/s]
81%|████████  | 145658/180000 [00:11<00:02, 12760.62it/s]
82%|████████▏ | 146977/180000 [00:11<00:02, 12887.08it/s]
82%|████████▏ | 148267/180000 [00:11<00:02, 12776.07it/s]
83%|████████▎ | 149546/180000 [00:11<00:02, 12661.36it/s]
84%|████████▍ | 150813/180000 [00:11<00:02, 12463.29it/s]
84%|████████▍ | 152061/180000 [00:12<00:02, 12459.32it/s]
85%|████████▌ | 153384/180000 [00:12<00:02, 12685.83it/s]
86%|████████▌ | 154695/180000 [00:12<00:01, 12807.62it/s]
87%|████████▋ | 155997/180000 [00:12<00:01, 12868.28it/s]
87%|████████▋ | 157285/180000 [00:12<00:01, 12604.31it/s]
88%|████████▊ | 158568/180000 [00:12<00:01, 12664.58it/s]
89%|████████▉ | 159844/180000 [00:12<00:01, 12692.34it/s]
90%|████████▉ | 161123/180000 [00:12<00:01, 12720.44it/s]
90%|█████████ | 162396/180000 [00:12<00:01, 12581.93it/s]
91%|█████████ | 163718/180000 [00:12<00:01, 12770.70it/s]
92%|█████████▏| 165022/180000 [00:13<00:01, 12849.43it/s]
92%|█████████▏| 166321/180000 [00:13<00:01, 12891.24it/s]
93%|█████████▎| 167611/180000 [00:13<00:00, 12779.08it/s]
94%|█████████▍| 168890/180000 [00:13<00:00, 12612.08it/s]
95%|█████████▍| 170208/180000 [00:13<00:00, 12776.56it/s]
95%|█████████▌| 171489/180000 [00:13<00:00, 12781.84it/s]
96%|█████████▌| 172788/180000 [00:13<00:00, 12842.05it/s]
97%|█████████▋| 174073/180000 [00:13<00:00, 12584.60it/s]
97%|█████████▋| 175435/180000 [00:13<00:00, 12888.92it/s]
98%|█████████▊| 176751/180000 [00:13<00:00, 12965.83it/s]
99%|█████████▉| 178056/180000 [00:14<00:00, 12987.97it/s]
100%|█████████▉| 179372/180000 [00:14<00:00, 13039.05it/s]
100%|██████████| 180000/180000 [00:14<00:00, 12680.89it/s]

We can compare the performance of each policy by checking the average reward at the end of each episode.

(
    trace_df
    .groupby(['policy_idx', 'episode'])
    .last()
    .groupby('policy_idx')
    .reward_stat.mean()
)
policy_idx
0    736.1
1    817.0
2    854.0
Name: reward_stat, dtype: float64

We see that using a rolling average gives a boost to the epsilon greedy strategy. However, we see that the bandit.ThompsonSampling policy performs even better, even though no particular care was given to drift. A natural next step would thus be to see how it could be improved to handle drift. For instance, its dist parameter could be wrapped with a utils.Rolling:

policy = bandit.ThompsonSampling(
    reward_obj=utils.Rolling(proba.Beta(), window_size=50),
    seed=42
)

Bandits can be used for several tasks. They can be used for content personalization, as well as online model selection (see model_selection.BanditRegressor). The policies in River are therefore designed to be flexible, so that they can be used in conjunction with other River modules. For instance, the reward_obj in bandit.EpsilonGreedy can be a metric, a probability distribution, or a statistic. This works because objects in River adher to a coherent get/update interface.