Multi-armed bandits¶
River has a bandit module. It contains several multi-armed bandit policies, bandit environments, and utilities to benchmark policies on bandit problems.
Bandit environments in River implement the Gym interface. You can thus load them with gym.make. Note that Gym is intended for reinforcement learning algorithms, while bandit policies are the simplest form of reinforcement learning. Bandit policies learn by receiving a reward after each step, while reinforcement learning algorithms have to learn from feedback that may arrive at the end of a (long) sequence of steps.
import gymnasium as gym
for k in gym.envs.registry:
if k.startswith('river_bandits'):
print(k)
River's bandit module offers the bandit.evaluate function to benchmark several policies on a given environment. It takes as input a list of bandit policies, a bandit environment (the problem to solve), and a reward object.
import gymnasium as gym
from river import bandit
import pandas as pd
from tqdm import tqdm
from river import stats
policies=[
bandit.EpsilonGreedy(epsilon=0.1),
bandit.EpsilonGreedy(epsilon=0.01),
bandit.EpsilonGreedy(epsilon=0),
]
env = gym.make(
'river_bandits/KArmedTestbed-v0',
max_episode_steps=1000
)
trace = bandit.evaluate(
policies=policies,
env=env,
reward_stat=stats.Mean(),
n_episodes=(n_episodes := 2000),
)
The bandit.evaluate function returns a generator containing the results at each step of the benchmark. This can be wrapped with a pandas.DataFrame to gather all the results.
trace_df = pd.DataFrame(tqdm(
trace, position=0, total=(
n_episodes *
len(policies) *
env._max_episode_steps
)
))
trace_df.sample(5, random_state=42)
0%| | 0/6000000 [00:00<?, ?it/s]
0%| | 16179/6000000 [00:00<00:36, 161782.83it/s]
1%| | 32888/6000000 [00:00<00:36, 164898.02it/s]
1%| | 49378/6000000 [00:00<00:36, 164714.48it/s]
1%| | 65907/6000000 [00:00<00:35, 164939.42it/s]
1%|▏ | 82401/6000000 [00:00<00:36, 164018.43it/s]
2%|▏ | 98978/6000000 [00:00<00:35, 164610.18it/s]
2%|▏ | 115440/6000000 [00:00<00:35, 164228.78it/s]
2%|▏ | 132001/6000000 [00:00<00:35, 164363.31it/s]
2%|▏ | 148438/6000000 [00:00<00:43, 135057.33it/s]
3%|▎ | 164837/6000000 [00:01<00:40, 142802.07it/s]
3%|▎ | 181035/6000000 [00:01<00:39, 148106.13it/s]
3%|▎ | 197492/6000000 [00:01<00:37, 152766.69it/s]
4%|▎ | 213680/6000000 [00:01<00:37, 155391.39it/s]
4%|▍ | 230066/6000000 [00:01<00:36, 157858.23it/s]
4%|▍ | 246408/6000000 [00:01<00:36, 159490.51it/s]
4%|▍ | 262893/6000000 [00:01<00:35, 161073.49it/s]
5%|▍ | 279112/6000000 [00:01<00:35, 159267.40it/s]
5%|▍ | 295120/6000000 [00:01<00:36, 155773.50it/s]
5%|▌ | 311470/6000000 [00:01<00:35, 158025.56it/s]
5%|▌ | 327838/6000000 [00:02<00:35, 159687.43it/s]
6%|▌ | 344346/6000000 [00:02<00:35, 161281.75it/s]
6%|▌ | 360506/6000000 [00:02<00:35, 161103.85it/s]
6%|▋ | 376894/6000000 [00:02<00:34, 161926.86it/s]
7%|▋ | 393103/6000000 [00:02<00:34, 161840.21it/s]
7%|▋ | 409455/6000000 [00:02<00:34, 162339.71it/s]
7%|▋ | 425802/6000000 [00:02<00:34, 162673.47it/s]
7%|▋ | 442300/6000000 [00:02<00:34, 163362.06it/s]
8%|▊ | 458981/6000000 [00:02<00:33, 164392.50it/s]
8%|▊ | 475424/6000000 [00:02<00:33, 163326.85it/s]
8%|▊ | 491903/6000000 [00:03<00:33, 163761.27it/s]
8%|▊ | 508282/6000000 [00:03<00:33, 163217.36it/s]
9%|▊ | 524798/6000000 [00:03<00:33, 163795.76it/s]
9%|▉ | 541180/6000000 [00:03<00:33, 163534.97it/s]
9%|▉ | 557887/6000000 [00:03<00:33, 164590.76it/s]
10%|▉ | 574348/6000000 [00:03<00:33, 163890.33it/s]
10%|▉ | 590829/6000000 [00:03<00:32, 164162.97it/s]
10%|█ | 607247/6000000 [00:03<00:32, 164123.89it/s]
10%|█ | 623661/6000000 [00:03<00:32, 164037.59it/s]
11%|█ | 640066/6000000 [00:03<00:32, 163238.75it/s]
11%|█ | 656450/6000000 [00:04<00:32, 163416.54it/s]
11%|█ | 672793/6000000 [00:04<00:32, 162965.74it/s]
11%|█▏ | 689337/6000000 [00:04<00:32, 163703.18it/s]
12%|█▏ | 705709/6000000 [00:04<00:32, 162901.42it/s]
12%|█▏ | 722415/6000000 [00:04<00:32, 164139.36it/s]
12%|█▏ | 738831/6000000 [00:04<00:32, 163626.59it/s]
13%|█▎ | 755425/6000000 [00:04<00:31, 164316.51it/s]
13%|█▎ | 771858/6000000 [00:04<00:31, 163578.71it/s]
13%|█▎ | 788367/6000000 [00:04<00:31, 164027.15it/s]
13%|█▎ | 804793/6000000 [00:04<00:31, 164095.02it/s]
14%|█▎ | 821204/6000000 [00:05<00:31, 164070.06it/s]
14%|█▍ | 837612/6000000 [00:05<00:31, 163391.52it/s]
14%|█▍ | 854215/6000000 [00:05<00:31, 164177.68it/s]
15%|█▍ | 870634/6000000 [00:05<00:31, 163982.86it/s]
15%|█▍ | 887158/6000000 [00:05<00:31, 164357.63it/s]
15%|█▌ | 903595/6000000 [00:05<00:31, 163801.95it/s]
15%|█▌ | 920067/6000000 [00:05<00:30, 164074.24it/s]
16%|█▌ | 936476/6000000 [00:05<00:30, 163741.92it/s]
16%|█▌ | 953153/6000000 [00:05<00:30, 164646.30it/s]
16%|█▌ | 969619/6000000 [00:06<00:30, 164178.67it/s]
16%|█▋ | 986394/6000000 [00:06<00:30, 165243.15it/s]
17%|█▋ | 1002953/6000000 [00:06<00:30, 165345.80it/s]
17%|█▋ | 1019582/6000000 [00:06<00:30, 165625.05it/s]
17%|█▋ | 1036145/6000000 [00:06<00:30, 164725.58it/s]
18%|█▊ | 1052916/6000000 [00:06<00:29, 165613.91it/s]
18%|█▊ | 1069479/6000000 [00:06<00:30, 164056.50it/s]
18%|█▊ | 1086001/6000000 [00:06<00:29, 163845.91it/s]
18%|█▊ | 1102389/6000000 [00:06<00:30, 161009.19it/s]
19%|█▊ | 1118501/6000000 [00:06<00:31, 154584.00it/s]
19%|█▉ | 1135050/6000000 [00:07<00:30, 157720.50it/s]
19%|█▉ | 1151486/6000000 [00:07<00:30, 159652.02it/s]
19%|█▉ | 1167896/6000000 [00:07<00:30, 160956.21it/s]
20%|█▉ | 1184490/6000000 [00:07<00:29, 162427.33it/s]
20%|██ | 1200978/6000000 [00:07<00:29, 163152.67it/s]
20%|██ | 1217451/6000000 [00:07<00:29, 163619.62it/s]
21%|██ | 1233826/6000000 [00:07<00:29, 163527.78it/s]
21%|██ | 1250327/6000000 [00:07<00:28, 163968.34it/s]
21%|██ | 1266731/6000000 [00:07<00:28, 163462.41it/s]
21%|██▏ | 1283266/6000000 [00:07<00:28, 164025.38it/s]
22%|██▏ | 1299673/6000000 [00:08<00:28, 163543.82it/s]
22%|██▏ | 1316447/6000000 [00:08<00:28, 164796.48it/s]
22%|██▏ | 1332930/6000000 [00:08<00:28, 164301.20it/s]
22%|██▏ | 1349424/6000000 [00:08<00:28, 164491.26it/s]
23%|██▎ | 1365875/6000000 [00:08<00:28, 164040.67it/s]
23%|██▎ | 1382705/6000000 [00:08<00:27, 165312.45it/s]
23%|██▎ | 1399238/6000000 [00:08<00:27, 164685.46it/s]
24%|██▎ | 1415867/6000000 [00:08<00:27, 165161.81it/s]
24%|██▍ | 1432385/6000000 [00:08<00:27, 164811.04it/s]
24%|██▍ | 1448943/6000000 [00:08<00:27, 165039.61it/s]
24%|██▍ | 1465448/6000000 [00:09<00:27, 164541.10it/s]
25%|██▍ | 1482001/6000000 [00:09<00:27, 164331.48it/s]
25%|██▍ | 1498799/6000000 [00:09<00:27, 165418.32it/s]
25%|██▌ | 1515342/6000000 [00:09<00:27, 164323.93it/s]
26%|██▌ | 1532013/6000000 [00:09<00:27, 165031.68it/s]
26%|██▌ | 1548519/6000000 [00:09<00:27, 164511.44it/s]
26%|██▌ | 1565157/6000000 [00:09<00:26, 165067.55it/s]
26%|██▋ | 1581666/6000000 [00:09<00:26, 164763.62it/s]
27%|██▋ | 1598253/6000000 [00:09<00:26, 165093.50it/s]
27%|██▋ | 1614764/6000000 [00:09<00:26, 164446.64it/s]
27%|██▋ | 1631312/6000000 [00:10<00:26, 164752.74it/s]
27%|██▋ | 1647789/6000000 [00:10<00:26, 164533.70it/s]
28%|██▊ | 1664313/6000000 [00:10<00:26, 164742.00it/s]
28%|██▊ | 1680788/6000000 [00:10<00:26, 163921.66it/s]
28%|██▊ | 1697531/6000000 [00:10<00:26, 164967.16it/s]
29%|██▊ | 1714030/6000000 [00:10<00:26, 164152.20it/s]
29%|██▉ | 1730731/6000000 [00:10<00:25, 165003.73it/s]
29%|██▉ | 1747233/6000000 [00:10<00:25, 164901.05it/s]
29%|██▉ | 1763804/6000000 [00:10<00:25, 165141.90it/s]
30%|██▉ | 1780320/6000000 [00:10<00:25, 164650.98it/s]
30%|██▉ | 1796786/6000000 [00:11<00:25, 164394.73it/s]
30%|███ | 1813227/6000000 [00:11<00:25, 163692.60it/s]
30%|███ | 1829737/6000000 [00:11<00:25, 164110.62it/s]
31%|███ | 1846149/6000000 [00:11<00:25, 163525.15it/s]
31%|███ | 1862772/6000000 [00:11<00:25, 164329.67it/s]
31%|███▏ | 1879206/6000000 [00:11<00:25, 163739.72it/s]
32%|███▏ | 1895682/6000000 [00:11<00:25, 164042.76it/s]
32%|███▏ | 1912088/6000000 [00:11<00:25, 162385.06it/s]
32%|███▏ | 1928440/6000000 [00:11<00:25, 162720.69it/s]
32%|███▏ | 1944716/6000000 [00:11<00:25, 162082.00it/s]
33%|███▎ | 1960927/6000000 [00:12<00:25, 161117.88it/s]
33%|███▎ | 1977116/6000000 [00:12<00:24, 161345.76it/s]
33%|███▎ | 1993407/6000000 [00:12<00:24, 161809.42it/s]
33%|███▎ | 2009884/6000000 [00:12<00:24, 162690.12it/s]
34%|███▍ | 2026155/6000000 [00:12<00:24, 162272.36it/s]
34%|███▍ | 2042531/6000000 [00:12<00:24, 162715.35it/s]
34%|███▍ | 2058804/6000000 [00:12<00:24, 162179.17it/s]
35%|███▍ | 2075046/6000000 [00:12<00:24, 162248.49it/s]
35%|███▍ | 2091281/6000000 [00:12<00:24, 162276.24it/s]
35%|███▌ | 2107682/6000000 [00:12<00:23, 162792.96it/s]
35%|███▌ | 2124001/6000000 [00:13<00:23, 162853.44it/s]
36%|███▌ | 2140448/6000000 [00:13<00:23, 163334.67it/s]
36%|███▌ | 2157118/6000000 [00:13<00:23, 164342.43it/s]
36%|███▌ | 2173681/6000000 [00:13<00:23, 164725.31it/s]
37%|███▋ | 2190154/6000000 [00:13<00:23, 164280.00it/s]
37%|███▋ | 2206609/6000000 [00:13<00:23, 164358.29it/s]
37%|███▋ | 2223046/6000000 [00:13<00:22, 164283.59it/s]
37%|███▋ | 2239696/6000000 [00:13<00:22, 164944.42it/s]
38%|███▊ | 2256191/6000000 [00:13<00:22, 164380.25it/s]
38%|███▊ | 2272630/6000000 [00:13<00:22, 164145.74it/s]
38%|███▊ | 2289045/6000000 [00:14<00:22, 163988.28it/s]
38%|███▊ | 2305950/6000000 [00:14<00:22, 165499.35it/s]
39%|███▊ | 2322501/6000000 [00:14<00:22, 164644.68it/s]
39%|███▉ | 2338967/6000000 [00:14<00:22, 164463.68it/s]
39%|███▉ | 2355415/6000000 [00:14<00:22, 164185.06it/s]
40%|███▉ | 2371973/6000000 [00:14<00:22, 164600.76it/s]
40%|███▉ | 2388434/6000000 [00:14<00:22, 163998.23it/s]
40%|████ | 2405030/6000000 [00:14<00:21, 164580.20it/s]
40%|████ | 2421489/6000000 [00:14<00:21, 164195.95it/s]
41%|████ | 2438309/6000000 [00:14<00:21, 165389.88it/s]
41%|████ | 2454849/6000000 [00:15<00:21, 164986.96it/s]
41%|████ | 2471634/6000000 [00:15<00:21, 165840.05it/s]
41%|████▏ | 2488219/6000000 [00:15<00:21, 165410.82it/s]
42%|████▏ | 2504868/6000000 [00:15<00:21, 165730.26it/s]
42%|████▏ | 2521442/6000000 [00:15<00:21, 165226.70it/s]
42%|████▏ | 2538003/6000000 [00:15<00:20, 165325.62it/s]
43%|████▎ | 2554826/6000000 [00:15<00:20, 166193.58it/s]
43%|████▎ | 2571485/6000000 [00:15<00:20, 166309.11it/s]
43%|████▎ | 2588326/6000000 [00:15<00:20, 166937.47it/s]
43%|████▎ | 2605021/6000000 [00:15<00:20, 165812.73it/s]
44%|████▎ | 2621885/6000000 [00:16<00:20, 166655.05it/s]
44%|████▍ | 2638553/6000000 [00:16<00:20, 166238.86it/s]
44%|████▍ | 2655179/6000000 [00:16<00:20, 165406.38it/s]
45%|████▍ | 2671722/6000000 [00:16<00:20, 165319.03it/s]
45%|████▍ | 2688255/6000000 [00:16<00:20, 164694.50it/s]
45%|████▌ | 2704761/6000000 [00:16<00:19, 164801.29it/s]
45%|████▌ | 2721242/6000000 [00:16<00:19, 164481.12it/s]
46%|████▌ | 2737691/6000000 [00:16<00:19, 164388.88it/s]
46%|████▌ | 2754185/6000000 [00:16<00:19, 164551.34it/s]
46%|████▌ | 2770864/6000000 [00:16<00:19, 165219.21it/s]
46%|████▋ | 2787387/6000000 [00:17<00:19, 164489.85it/s]
47%|████▋ | 2803837/6000000 [00:17<00:19, 164081.72it/s]
47%|████▋ | 2820321/6000000 [00:17<00:19, 164305.12it/s]
47%|████▋ | 2837034/6000000 [00:17<00:19, 165147.31it/s]
48%|████▊ | 2853683/6000000 [00:17<00:19, 165546.97it/s]
48%|████▊ | 2870239/6000000 [00:17<00:18, 165207.86it/s]
48%|████▊ | 2886761/6000000 [00:17<00:18, 164394.25it/s]
48%|████▊ | 2903483/6000000 [00:17<00:18, 165234.59it/s]
49%|████▊ | 2920008/6000000 [00:17<00:18, 165102.77it/s]
49%|████▉ | 2936576/6000000 [00:17<00:18, 165272.60it/s]
49%|████▉ | 2953104/6000000 [00:18<00:18, 164510.69it/s]
49%|████▉ | 2969926/6000000 [00:18<00:18, 165615.94it/s]
50%|████▉ | 2986489/6000000 [00:18<00:18, 164788.29it/s]
50%|█████ | 3003002/6000000 [00:18<00:18, 164878.52it/s]
50%|█████ | 3019796/6000000 [00:18<00:17, 165790.22it/s]
51%|█████ | 3036377/6000000 [00:18<00:17, 165427.75it/s]
51%|█████ | 3053211/6000000 [00:18<00:17, 166297.60it/s]
51%|█████ | 3069842/6000000 [00:18<00:17, 165671.61it/s]
51%|█████▏ | 3086411/6000000 [00:18<00:17, 165618.53it/s]
52%|█████▏ | 3102974/6000000 [00:18<00:17, 164588.11it/s]
52%|█████▏ | 3119556/6000000 [00:19<00:17, 164952.31it/s]
52%|█████▏ | 3136053/6000000 [00:19<00:17, 164642.83it/s]
53%|█████▎ | 3152773/6000000 [00:19<00:17, 165405.28it/s]
53%|█████▎ | 3169315/6000000 [00:19<00:17, 165091.71it/s]
53%|█████▎ | 3186001/6000000 [00:19<00:17, 165379.18it/s]
53%|█████▎ | 3202540/6000000 [00:19<00:16, 165154.70it/s]
54%|█████▎ | 3219056/6000000 [00:19<00:16, 164736.78it/s]
54%|█████▍ | 3235696/6000000 [00:19<00:16, 165232.17it/s]
54%|█████▍ | 3252220/6000000 [00:19<00:16, 164895.65it/s]
54%|█████▍ | 3269129/6000000 [00:19<00:16, 166148.53it/s]
55%|█████▍ | 3285745/6000000 [00:20<00:16, 164950.50it/s]
55%|█████▌ | 3302413/6000000 [00:20<00:16, 165464.58it/s]
55%|█████▌ | 3318962/6000000 [00:20<00:16, 165075.86it/s]
56%|█████▌ | 3335555/6000000 [00:20<00:16, 165327.80it/s]
56%|█████▌ | 3352089/6000000 [00:20<00:16, 164325.32it/s]
56%|█████▌ | 3368524/6000000 [00:20<00:16, 164246.84it/s]
56%|█████▋ | 3384950/6000000 [00:20<00:15, 164240.53it/s]
57%|█████▋ | 3401421/6000000 [00:20<00:15, 164379.15it/s]
57%|█████▋ | 3417860/6000000 [00:20<00:15, 163906.28it/s]
57%|█████▋ | 3434532/6000000 [00:21<00:15, 164745.94it/s]
58%|█████▊ | 3451013/6000000 [00:21<00:15, 164764.21it/s]
58%|█████▊ | 3467619/6000000 [00:21<00:15, 165149.79it/s]
58%|█████▊ | 3484135/6000000 [00:21<00:15, 164897.18it/s]
58%|█████▊ | 3500626/6000000 [00:21<00:15, 164687.82it/s]
59%|█████▊ | 3517096/6000000 [00:21<00:15, 164195.29it/s]
59%|█████▉ | 3533785/6000000 [00:21<00:14, 164997.56it/s]
59%|█████▉ | 3550286/6000000 [00:21<00:14, 164397.24it/s]
59%|█████▉ | 3566971/6000000 [00:21<00:14, 165128.11it/s]
60%|█████▉ | 3583485/6000000 [00:21<00:14, 164496.92it/s]
60%|██████ | 3600001/6000000 [00:22<00:14, 164165.01it/s]
60%|██████ | 3616597/6000000 [00:22<00:14, 164698.87it/s]
61%|██████ | 3633068/6000000 [00:22<00:14, 164557.34it/s]
61%|██████ | 3649525/6000000 [00:22<00:14, 164482.27it/s]
61%|██████ | 3666001/6000000 [00:22<00:14, 164294.11it/s]
61%|██████▏ | 3682707/6000000 [00:22<00:14, 165118.32it/s]
62%|██████▏ | 3699220/6000000 [00:22<00:13, 165089.23it/s]
62%|██████▏ | 3715807/6000000 [00:22<00:13, 165320.17it/s]
62%|██████▏ | 3732340/6000000 [00:22<00:13, 165033.59it/s]
62%|██████▏ | 3749153/6000000 [00:22<00:13, 165957.47it/s]
63%|██████▎ | 3765750/6000000 [00:23<00:13, 165377.07it/s]
63%|██████▎ | 3782377/6000000 [00:23<00:13, 165642.76it/s]
63%|██████▎ | 3798942/6000000 [00:23<00:13, 164976.96it/s]
64%|██████▎ | 3815617/6000000 [00:23<00:13, 165504.15it/s]
64%|██████▍ | 3832169/6000000 [00:23<00:13, 164699.05it/s]
64%|██████▍ | 3848718/6000000 [00:23<00:13, 164932.57it/s]
64%|██████▍ | 3865213/6000000 [00:23<00:12, 164435.62it/s]
65%|██████▍ | 3881845/6000000 [00:23<00:12, 164996.44it/s]
65%|██████▍ | 3898346/6000000 [00:23<00:12, 163858.40it/s]
65%|██████▌ | 3914740/6000000 [00:23<00:12, 163857.38it/s]
66%|██████▌ | 3931128/6000000 [00:24<00:13, 158284.11it/s]
66%|██████▌ | 3947354/6000000 [00:24<00:12, 159442.13it/s]
66%|██████▌ | 3963493/6000000 [00:24<00:12, 160011.86it/s]
66%|██████▋ | 3979646/6000000 [00:24<00:12, 160459.14it/s]
67%|██████▋ | 3996001/6000000 [00:24<00:12, 161215.65it/s]
67%|██████▋ | 4012712/6000000 [00:24<00:12, 162969.92it/s]
67%|██████▋ | 4029077/6000000 [00:24<00:12, 163171.06it/s]
67%|██████▋ | 4045863/6000000 [00:24<00:11, 164570.35it/s]
68%|██████▊ | 4062326/6000000 [00:24<00:11, 164260.72it/s]
68%|██████▊ | 4078937/6000000 [00:24<00:11, 164812.12it/s]
68%|██████▊ | 4095422/6000000 [00:25<00:11, 164252.78it/s]
69%|██████▊ | 4112070/6000000 [00:25<00:11, 164916.43it/s]
69%|██████▉ | 4128564/6000000 [00:25<00:11, 164412.72it/s]
69%|██████▉ | 4145251/6000000 [00:25<00:11, 165145.37it/s]
69%|██████▉ | 4161767/6000000 [00:25<00:11, 164818.32it/s]
70%|██████▉ | 4178847/6000000 [00:25<00:10, 166604.20it/s]
70%|██████▉ | 4195509/6000000 [00:25<00:10, 165680.27it/s]
70%|███████ | 4212079/6000000 [00:25<00:10, 165150.22it/s]
70%|███████ | 4228925/6000000 [00:25<00:10, 166135.05it/s]
71%|███████ | 4245541/6000000 [00:25<00:10, 164953.39it/s]
71%|███████ | 4262134/6000000 [00:26<00:10, 165241.95it/s]
71%|███████▏ | 4278661/6000000 [00:26<00:10, 164350.68it/s]
72%|███████▏ | 4295240/6000000 [00:26<00:10, 164778.35it/s]
72%|███████▏ | 4311720/6000000 [00:26<00:10, 163944.83it/s]
72%|███████▏ | 4328128/6000000 [00:26<00:10, 163982.21it/s]
72%|███████▏ | 4344528/6000000 [00:26<00:10, 163932.28it/s]
73%|███████▎ | 4361277/6000000 [00:26<00:09, 164992.80it/s]
73%|███████▎ | 4377778/6000000 [00:26<00:09, 164297.87it/s]
73%|███████▎ | 4394526/6000000 [00:26<00:09, 165246.13it/s]
74%|███████▎ | 4411052/6000000 [00:26<00:09, 164203.36it/s]
74%|███████▍ | 4427475/6000000 [00:27<00:09, 164071.72it/s]
74%|███████▍ | 4443984/6000000 [00:27<00:09, 164372.56it/s]
74%|███████▍ | 4460613/6000000 [00:27<00:09, 164943.54it/s]
75%|███████▍ | 4477133/6000000 [00:27<00:09, 165017.61it/s]
75%|███████▍ | 4493692/6000000 [00:27<00:09, 165187.52it/s]
75%|███████▌ | 4510212/6000000 [00:27<00:09, 164227.49it/s]
75%|███████▌ | 4526836/6000000 [00:27<00:08, 164824.98it/s]
76%|███████▌ | 4543320/6000000 [00:27<00:08, 162188.48it/s]
76%|███████▌ | 4559973/6000000 [00:27<00:08, 163471.49it/s]
76%|███████▋ | 4576329/6000000 [00:27<00:08, 162355.38it/s]
77%|███████▋ | 4592928/6000000 [00:28<00:08, 163432.52it/s]
77%|███████▋ | 4609278/6000000 [00:28<00:08, 162988.88it/s]
77%|███████▋ | 4625768/6000000 [00:28<00:08, 163556.67it/s]
77%|███████▋ | 4642287/6000000 [00:28<00:08, 164041.02it/s]
78%|███████▊ | 4658694/6000000 [00:28<00:08, 164028.22it/s]
78%|███████▊ | 4675099/6000000 [00:28<00:08, 163299.23it/s]
78%|███████▊ | 4691787/6000000 [00:28<00:07, 164366.53it/s]
78%|███████▊ | 4708226/6000000 [00:28<00:07, 163796.55it/s]
79%|███████▊ | 4724781/6000000 [00:28<00:07, 164317.46it/s]
79%|███████▉ | 4741215/6000000 [00:28<00:07, 164148.86it/s]
79%|███████▉ | 4757757/6000000 [00:29<00:07, 164526.34it/s]
80%|███████▉ | 4774211/6000000 [00:29<00:07, 163807.44it/s]
80%|███████▉ | 4790869/6000000 [00:29<00:07, 164634.33it/s]
80%|████████ | 4807334/6000000 [00:29<00:07, 163737.19it/s]
80%|████████ | 4824001/6000000 [00:29<00:07, 164319.67it/s]
81%|████████ | 4840725/6000000 [00:29<00:07, 165190.14it/s]
81%|████████ | 4857246/6000000 [00:29<00:06, 164599.90it/s]
81%|████████ | 4873752/6000000 [00:29<00:06, 164735.97it/s]
82%|████████▏ | 4890227/6000000 [00:29<00:06, 164095.18it/s]
82%|████████▏ | 4906638/6000000 [00:29<00:06, 163785.03it/s]
82%|████████▏ | 4923018/6000000 [00:30<00:06, 163310.73it/s]
82%|████████▏ | 4939522/6000000 [00:30<00:06, 163825.61it/s]
83%|████████▎ | 4955978/6000000 [00:30<00:06, 164041.95it/s]
83%|████████▎ | 4972383/6000000 [00:30<00:06, 162782.30it/s]
83%|████████▎ | 4988955/6000000 [00:30<00:06, 163654.87it/s]
83%|████████▎ | 5005323/6000000 [00:30<00:06, 163046.71it/s]
84%|████████▎ | 5021812/6000000 [00:30<00:05, 163595.13it/s]
84%|████████▍ | 5038174/6000000 [00:30<00:05, 162887.84it/s]
84%|████████▍ | 5054705/6000000 [00:30<00:05, 163608.22it/s]
85%|████████▍ | 5071068/6000000 [00:30<00:05, 162682.92it/s]
85%|████████▍ | 5087492/6000000 [00:31<00:05, 163143.34it/s]
85%|████████▌ | 5103808/6000000 [00:31<00:05, 162666.51it/s]
85%|████████▌ | 5120076/6000000 [00:31<00:05, 162558.30it/s]
86%|████████▌ | 5136333/6000000 [00:31<00:05, 162223.00it/s]
86%|████████▌ | 5152556/6000000 [00:31<00:05, 162108.21it/s]
86%|████████▌ | 5168876/6000000 [00:31<00:05, 162431.01it/s]
86%|████████▋ | 5185120/6000000 [00:31<00:05, 161830.97it/s]
87%|████████▋ | 5201564/6000000 [00:31<00:04, 162606.69it/s]
87%|████████▋ | 5217826/6000000 [00:31<00:04, 162265.35it/s]
87%|████████▋ | 5234370/6000000 [00:31<00:04, 163213.26it/s]
88%|████████▊ | 5250693/6000000 [00:32<00:04, 162610.08it/s]
88%|████████▊ | 5267430/6000000 [00:32<00:04, 164028.65it/s]
88%|████████▊ | 5283835/6000000 [00:32<00:04, 163574.84it/s]
88%|████████▊ | 5300244/6000000 [00:32<00:04, 163726.35it/s]
89%|████████▊ | 5316674/6000000 [00:32<00:04, 163896.00it/s]
89%|████████▉ | 5333334/6000000 [00:32<00:04, 164702.65it/s]
89%|████████▉ | 5349805/6000000 [00:32<00:03, 164363.76it/s]
89%|████████▉ | 5366305/6000000 [00:32<00:03, 164551.04it/s]
90%|████████▉ | 5382761/6000000 [00:32<00:03, 162992.20it/s]
90%|████████▉ | 5399216/6000000 [00:32<00:03, 163454.75it/s]
90%|█████████ | 5415565/6000000 [00:33<00:03, 162818.61it/s]
91%|█████████ | 5431896/6000000 [00:33<00:03, 162963.01it/s]
91%|█████████ | 5448194/6000000 [00:33<00:03, 162634.79it/s]
91%|█████████ | 5464610/6000000 [00:33<00:03, 163088.97it/s]
91%|█████████▏| 5480963/6000000 [00:33<00:03, 163217.87it/s]
92%|█████████▏| 5497370/6000000 [00:33<00:03, 163470.04it/s]
92%|█████████▏| 5513974/6000000 [00:33<00:02, 164238.29it/s]
92%|█████████▏| 5530399/6000000 [00:33<00:02, 163668.30it/s]
92%|█████████▏| 5546781/6000000 [00:33<00:02, 163712.19it/s]
93%|█████████▎| 5563153/6000000 [00:33<00:02, 163443.86it/s]
93%|█████████▎| 5579711/6000000 [00:34<00:02, 164080.06it/s]
93%|█████████▎| 5596120/6000000 [00:34<00:02, 163185.51it/s]
94%|█████████▎| 5612684/6000000 [00:34<00:02, 163916.44it/s]
94%|█████████▍| 5629077/6000000 [00:34<00:02, 163692.37it/s]
94%|█████████▍| 5645505/6000000 [00:34<00:02, 163864.93it/s]
94%|█████████▍| 5661893/6000000 [00:34<00:02, 163178.86it/s]
95%|█████████▍| 5678233/6000000 [00:34<00:01, 163242.03it/s]
95%|█████████▍| 5694558/6000000 [00:34<00:01, 162586.34it/s]
95%|█████████▌| 5711200/6000000 [00:34<00:01, 163727.32it/s]
95%|█████████▌| 5727574/6000000 [00:34<00:01, 163096.52it/s]
96%|█████████▌| 5743885/6000000 [00:35<00:01, 162861.88it/s]
96%|█████████▌| 5760386/6000000 [00:35<00:01, 163501.01it/s]
96%|█████████▋| 5776737/6000000 [00:35<00:01, 162961.33it/s]
97%|█████████▋| 5793034/6000000 [00:35<00:01, 162396.63it/s]
97%|█████████▋| 5809574/6000000 [00:35<00:01, 163291.17it/s]
97%|█████████▋| 5825905/6000000 [00:35<00:01, 163289.05it/s]
97%|█████████▋| 5842235/6000000 [00:35<00:00, 163185.41it/s]
98%|█████████▊| 5858764/6000000 [00:35<00:00, 163814.12it/s]
98%|█████████▊| 5875317/6000000 [00:35<00:00, 164326.64it/s]
98%|█████████▊| 5891892/6000000 [00:35<00:00, 164751.30it/s]
98%|█████████▊| 5908368/6000000 [00:36<00:00, 163825.30it/s]
99%|█████████▊| 5924980/6000000 [00:36<00:00, 164509.10it/s]
99%|█████████▉| 5941433/6000000 [00:36<00:00, 163920.88it/s]
99%|█████████▉| 5957896/6000000 [00:36<00:00, 164130.08it/s]
100%|█████████▉| 5974310/6000000 [00:36<00:00, 163952.40it/s]
100%|█████████▉| 5990864/6000000 [00:36<00:00, 164424.43it/s]
100%|██████████| 6000000/6000000 [00:36<00:00, 163670.06it/s]
| episode | step | policy_idx | arm | reward | reward_stat | |
|---|---|---|---|---|---|---|
| 1324896 | 441 | 632 | 0 | 1 | 2.849711 | 1.126770 |
| 3566176 | 1188 | 725 | 1 | 0 | 1.953658 | 0.419003 |
| 1109043 | 369 | 681 | 0 | 0 | 3.786615 | 2.355404 |
| 4286042 | 1428 | 680 | 2 | 1 | 3.569374 | 1.674921 |
| 5395174 | 1798 | 391 | 1 | 7 | 1.069541 | 1.593863 |
It is then straightforward to plot the average reward each policy obtains at each step, by averaging over episodes.
policy_names = {
0: 'ε = 0.1',
1: 'ε = 0.01',
2: 'ε = 0 (greedy)'
}
(
trace_df
.assign(policy=trace_df.policy_idx.map(policy_names))
.groupby(['step', 'policy'])
['reward'].mean()
.unstack()
.plot()
)
[1m<[0m[1;95mAxes:[0m[39m [0m[33mxlabel[0m[39m=[0m[32m'step'[0m[1m>[0m

Controlling the evaluation loop¶
The bandit.evaluate function is useful for benchmarking. But in practice, you'll want to have control over your bandit policy. Indeed you'll want the freedom to pull arms (with the pull method) and update the policy (with the update method) at your discretion.
As an example, the following is a possible reimplementation of the bandit.evaluate function. Here we'll be measuring the rate at which each policy selects the optimal arm.
Note how the pull and update methods are used.
import copy
policies=[
bandit.EpsilonGreedy(epsilon=0.1),
bandit.EpsilonGreedy(epsilon=0.01),
bandit.EpsilonGreedy(epsilon=0),
]
env = gym.make(
'river_bandits/KArmedTestbed-v0',
max_episode_steps=1000
)
n_episodes = 2000
trace = []
with tqdm(total=len(policies) * n_episodes * env._max_episode_steps, position=0) as progress:
for policy in policies:
for episode in range(n_episodes):
episode_policy = policy.clone()
episode_env = copy.deepcopy(env)
episode_env.reset()
step = 0
while True:
action = episode_policy.pull(range(episode_env.action_space.n))
observation, reward, terminated, truncated, info = episode_env.step(action)
best_action = observation
episode_policy.update(action, reward)
trace.append({
"episode": episode,
"step": step,
"policy": f"ε = {policy.epsilon}",
"is_action_optimal": action == best_action
})
step += 1
progress.update()
if terminated or truncated:
break
trace_df = pd.DataFrame(trace)
0%| | 0/6000000 [00:00<?, ?it/s]
0%| | 16259/6000000 [00:00<00:36, 162581.63it/s]
1%| | 33245/6000000 [00:00<00:35, 166861.39it/s]
1%| | 50127/6000000 [00:00<00:35, 167754.51it/s]
1%| | 67266/6000000 [00:00<00:35, 169188.85it/s]
1%|▏ | 84432/6000000 [00:00<00:34, 170077.96it/s]
2%|▏ | 101440/6000000 [00:00<00:34, 170009.25it/s]
2%|▏ | 118476/6000000 [00:00<00:34, 170123.81it/s]
2%|▏ | 135713/6000000 [00:00<00:34, 170838.02it/s]
3%|▎ | 152797/6000000 [00:00<00:34, 170628.63it/s]
3%|▎ | 169860/6000000 [00:01<00:34, 170349.13it/s]
3%|▎ | 186896/6000000 [00:01<00:34, 170204.48it/s]
3%|▎ | 203917/6000000 [00:01<00:34, 169073.75it/s]
4%|▎ | 220878/6000000 [00:01<00:34, 169234.80it/s]
4%|▍ | 237803/6000000 [00:01<00:35, 164379.85it/s]
4%|▍ | 254806/6000000 [00:01<00:34, 166041.23it/s]
5%|▍ | 271845/6000000 [00:01<00:34, 167327.89it/s]
5%|▍ | 288981/6000000 [00:01<00:33, 168525.33it/s]
5%|▌ | 305903/6000000 [00:01<00:33, 168729.92it/s]
5%|▌ | 323001/6000000 [00:01<00:33, 169306.59it/s]
6%|▌ | 340061/6000000 [00:02<00:33, 169690.62it/s]
6%|▌ | 357283/6000000 [00:02<00:33, 170447.10it/s]
6%|▌ | 374336/6000000 [00:02<00:33, 170470.76it/s]
7%|▋ | 391503/6000000 [00:02<00:32, 170829.89it/s]
7%|▋ | 408651/6000000 [00:02<00:32, 171023.46it/s]
7%|▋ | 425776/6000000 [00:02<00:32, 171089.72it/s]
7%|▋ | 442886/6000000 [00:02<00:32, 170960.60it/s]
8%|▊ | 460001/6000000 [00:02<00:32, 170879.43it/s]
8%|▊ | 477157/6000000 [00:02<00:32, 171082.43it/s]
8%|▊ | 494421/6000000 [00:02<00:32, 171547.53it/s]
9%|▊ | 511577/6000000 [00:03<00:32, 171109.76it/s]
9%|▉ | 528714/6000000 [00:03<00:31, 171185.87it/s]
9%|▉ | 545833/6000000 [00:03<00:31, 170987.75it/s]
9%|▉ | 562933/6000000 [00:03<00:31, 170650.23it/s]
10%|▉ | 580012/6000000 [00:03<00:31, 170689.18it/s]
10%|▉ | 597082/6000000 [00:03<00:31, 170653.60it/s]
10%|█ | 614157/6000000 [00:03<00:31, 170681.92it/s]
11%|█ | 631321/6000000 [00:03<00:31, 170968.35it/s]
11%|█ | 648418/6000000 [00:03<00:31, 170955.70it/s]
11%|█ | 665514/6000000 [00:03<00:31, 170740.39it/s]
11%|█▏ | 682630/6000000 [00:04<00:31, 170863.84it/s]
12%|█▏ | 699717/6000000 [00:04<00:31, 166660.34it/s]
12%|█▏ | 716709/6000000 [00:04<00:31, 167619.51it/s]
12%|█▏ | 733734/6000000 [00:04<00:31, 168398.12it/s]
13%|█▎ | 750588/6000000 [00:04<00:31, 167657.25it/s]
13%|█▎ | 767364/6000000 [00:04<00:31, 167264.07it/s]
13%|█▎ | 784097/6000000 [00:04<00:31, 166703.69it/s]
13%|█▎ | 800843/6000000 [00:04<00:31, 166927.55it/s]
14%|█▎ | 817540/6000000 [00:04<00:31, 166184.34it/s]
14%|█▍ | 834201/6000000 [00:04<00:31, 166310.45it/s]
14%|█▍ | 850915/6000000 [00:05<00:30, 166556.67it/s]
14%|█▍ | 867573/6000000 [00:05<00:30, 166288.98it/s]
15%|█▍ | 884203/6000000 [00:05<00:30, 166146.92it/s]
15%|█▌ | 900819/6000000 [00:05<00:30, 165173.31it/s]
15%|█▌ | 917524/6000000 [00:05<00:30, 165730.20it/s]
16%|█▌ | 934151/6000000 [00:05<00:30, 165891.34it/s]
16%|█▌ | 950753/6000000 [00:05<00:30, 165929.28it/s]
16%|█▌ | 967347/6000000 [00:05<00:30, 165828.50it/s]
16%|█▋ | 983998/6000000 [00:05<00:30, 166032.36it/s]
17%|█▋ | 1000693/6000000 [00:05<00:30, 166305.18it/s]
17%|█▋ | 1017324/6000000 [00:06<00:29, 166214.39it/s]
17%|█▋ | 1033984/6000000 [00:06<00:29, 166327.97it/s]
18%|█▊ | 1050617/6000000 [00:06<00:29, 166223.61it/s]
18%|█▊ | 1067240/6000000 [00:06<00:29, 165692.55it/s]
18%|█▊ | 1083973/6000000 [00:06<00:29, 166181.33it/s]
18%|█▊ | 1100658/6000000 [00:06<00:29, 166381.09it/s]
19%|█▊ | 1117303/6000000 [00:06<00:29, 166400.83it/s]
19%|█▉ | 1133950/6000000 [00:06<00:29, 166421.49it/s]
19%|█▉ | 1150644/6000000 [00:06<00:29, 166574.36it/s]
19%|█▉ | 1167302/6000000 [00:06<00:29, 166418.94it/s]
20%|█▉ | 1184001/6000000 [00:07<00:28, 166491.75it/s]
20%|██ | 1200702/6000000 [00:07<00:28, 166644.36it/s]
20%|██ | 1217367/6000000 [00:07<00:28, 166249.13it/s]
21%|██ | 1233993/6000000 [00:07<00:28, 166036.18it/s]
21%|██ | 1250597/6000000 [00:07<00:28, 165781.29it/s]
21%|██ | 1267220/6000000 [00:07<00:28, 165914.62it/s]
21%|██▏ | 1283813/6000000 [00:07<00:28, 165918.68it/s]
22%|██▏ | 1300441/6000000 [00:07<00:28, 166026.15it/s]
22%|██▏ | 1317044/6000000 [00:07<00:28, 165432.39it/s]
22%|██▏ | 1333874/6000000 [00:07<00:28, 166289.38it/s]
23%|██▎ | 1350515/6000000 [00:08<00:27, 166323.99it/s]
23%|██▎ | 1367213/6000000 [00:08<00:27, 166518.51it/s]
23%|██▎ | 1383984/6000000 [00:08<00:27, 166874.53it/s]
23%|██▎ | 1400676/6000000 [00:08<00:27, 166887.14it/s]
24%|██▎ | 1417365/6000000 [00:08<00:27, 166820.55it/s]
24%|██▍ | 1434048/6000000 [00:08<00:27, 166753.70it/s]
24%|██▍ | 1450760/6000000 [00:08<00:27, 166861.05it/s]
24%|██▍ | 1467447/6000000 [00:08<00:27, 166661.48it/s]
25%|██▍ | 1484114/6000000 [00:08<00:27, 166607.43it/s]
25%|██▌ | 1500863/6000000 [00:08<00:26, 166871.36it/s]
25%|██▌ | 1517551/6000000 [00:09<00:26, 166682.68it/s]
26%|██▌ | 1534220/6000000 [00:09<00:26, 166600.43it/s]
26%|██▌ | 1550896/6000000 [00:09<00:26, 166646.14it/s]
26%|██▌ | 1567571/6000000 [00:09<00:26, 166675.17it/s]
26%|██▋ | 1584267/6000000 [00:09<00:26, 166759.30it/s]
27%|██▋ | 1600973/6000000 [00:09<00:26, 166849.16it/s]
27%|██▋ | 1617658/6000000 [00:09<00:26, 166347.64it/s]
27%|██▋ | 1634294/6000000 [00:09<00:26, 166270.47it/s]
28%|██▊ | 1651001/6000000 [00:09<00:26, 166378.43it/s]
28%|██▊ | 1667640/6000000 [00:09<00:26, 166356.52it/s]
28%|██▊ | 1684413/6000000 [00:10<00:25, 166766.11it/s]
28%|██▊ | 1701090/6000000 [00:10<00:25, 166497.03it/s]
29%|██▊ | 1717740/6000000 [00:10<00:25, 166349.40it/s]
29%|██▉ | 1734376/6000000 [00:10<00:25, 166200.40it/s]
29%|██▉ | 1751047/6000000 [00:10<00:25, 166350.41it/s]
29%|██▉ | 1767838/6000000 [00:10<00:25, 166817.32it/s]
30%|██▉ | 1784520/6000000 [00:10<00:25, 166421.23it/s]
30%|███ | 1801163/6000000 [00:10<00:25, 166101.39it/s]
30%|███ | 1817774/6000000 [00:10<00:25, 166093.35it/s]
31%|███ | 1834402/6000000 [00:10<00:25, 166147.38it/s]
31%|███ | 1851017/6000000 [00:11<00:24, 166115.59it/s]
31%|███ | 1867632/6000000 [00:11<00:24, 166122.86it/s]
31%|███▏ | 1884353/6000000 [00:11<00:24, 166448.65it/s]
32%|███▏ | 1901001/6000000 [00:11<00:24, 166416.94it/s]
32%|███▏ | 1917643/6000000 [00:11<00:24, 165842.35it/s]
32%|███▏ | 1934228/6000000 [00:11<00:24, 165695.78it/s]
33%|███▎ | 1950887/6000000 [00:11<00:24, 165962.80it/s]
33%|███▎ | 1967484/6000000 [00:11<00:24, 165780.95it/s]
33%|███▎ | 1984063/6000000 [00:11<00:24, 165660.85it/s]
33%|███▎ | 2000802/6000000 [00:11<00:24, 166177.80it/s]
34%|███▎ | 2017420/6000000 [00:12<00:24, 163941.45it/s]
34%|███▍ | 2033821/6000000 [00:12<00:24, 162760.52it/s]
34%|███▍ | 2050103/6000000 [00:12<00:24, 161067.18it/s]
34%|███▍ | 2066216/6000000 [00:12<00:24, 160547.16it/s]
35%|███▍ | 2082314/6000000 [00:12<00:24, 160672.68it/s]
35%|███▍ | 2098385/6000000 [00:12<00:24, 160169.15it/s]
35%|███▌ | 2114450/6000000 [00:12<00:24, 160310.22it/s]
36%|███▌ | 2130483/6000000 [00:12<00:24, 159816.94it/s]
36%|███▌ | 2146466/6000000 [00:12<00:24, 158792.34it/s]
36%|███▌ | 2162347/6000000 [00:12<00:24, 158659.44it/s]
36%|███▋ | 2178321/6000000 [00:13<00:24, 158981.15it/s]
37%|███▋ | 2194221/6000000 [00:13<00:23, 158945.99it/s]
37%|███▋ | 2210117/6000000 [00:13<00:23, 158504.82it/s]
37%|███▋ | 2226001/6000000 [00:13<00:23, 158578.43it/s]
37%|███▋ | 2242001/6000000 [00:13<00:23, 158845.69it/s]
38%|███▊ | 2257894/6000000 [00:13<00:23, 158868.46it/s]
38%|███▊ | 2273808/6000000 [00:13<00:23, 158947.41it/s]
38%|███▊ | 2289703/6000000 [00:13<00:23, 158824.14it/s]
38%|███▊ | 2305586/6000000 [00:13<00:23, 158735.48it/s]
39%|███▊ | 2321460/6000000 [00:13<00:23, 158634.80it/s]
39%|███▉ | 2337611/6000000 [00:14<00:22, 159495.88it/s]
39%|███▉ | 2353561/6000000 [00:14<00:22, 159346.65it/s]
39%|███▉ | 2369496/6000000 [00:14<00:22, 159305.71it/s]
40%|███▉ | 2385427/6000000 [00:14<00:22, 158918.07it/s]
40%|████ | 2401320/6000000 [00:14<00:22, 158607.56it/s]
40%|████ | 2417181/6000000 [00:14<00:22, 158479.52it/s]
41%|████ | 2433192/6000000 [00:14<00:22, 158966.56it/s]
41%|████ | 2449089/6000000 [00:14<00:22, 158889.14it/s]
41%|████ | 2465018/6000000 [00:14<00:22, 159008.90it/s]
41%|████▏ | 2481113/6000000 [00:14<00:22, 159589.66it/s]
42%|████▏ | 2497113/6000000 [00:15<00:21, 159710.31it/s]
42%|████▏ | 2513119/6000000 [00:15<00:21, 159814.63it/s]
42%|████▏ | 2529101/6000000 [00:15<00:21, 159683.62it/s]
42%|████▏ | 2545070/6000000 [00:15<00:21, 159429.57it/s]
43%|████▎ | 2561014/6000000 [00:15<00:21, 159381.99it/s]
43%|████▎ | 2577047/6000000 [00:15<00:21, 159665.34it/s]
43%|████▎ | 2593014/6000000 [00:15<00:21, 159581.95it/s]
43%|████▎ | 2608973/6000000 [00:15<00:21, 159427.90it/s]
44%|████▎ | 2624920/6000000 [00:15<00:21, 159439.99it/s]
44%|████▍ | 2640865/6000000 [00:15<00:21, 159309.85it/s]
44%|████▍ | 2656851/6000000 [00:16<00:20, 159472.45it/s]
45%|████▍ | 2672799/6000000 [00:16<00:21, 158085.03it/s]
45%|████▍ | 2688738/6000000 [00:16<00:20, 158473.36it/s]
45%|████▌ | 2704635/6000000 [00:16<00:20, 158619.20it/s]
45%|████▌ | 2720499/6000000 [00:16<00:20, 158395.01it/s]
46%|████▌ | 2736340/6000000 [00:16<00:20, 158246.91it/s]
46%|████▌ | 2752404/6000000 [00:16<00:20, 158960.56it/s]
46%|████▌ | 2768301/6000000 [00:16<00:20, 158514.50it/s]
46%|████▋ | 2784194/6000000 [00:16<00:20, 158638.65it/s]
47%|████▋ | 2800152/6000000 [00:16<00:20, 158919.96it/s]
47%|████▋ | 2816045/6000000 [00:17<00:20, 158817.66it/s]
47%|████▋ | 2831972/6000000 [00:17<00:19, 158951.48it/s]
47%|████▋ | 2847868/6000000 [00:17<00:19, 158883.25it/s]
48%|████▊ | 2863848/6000000 [00:17<00:19, 159155.96it/s]
48%|████▊ | 2879764/6000000 [00:17<00:19, 158985.65it/s]
48%|████▊ | 2895663/6000000 [00:17<00:19, 158028.74it/s]
49%|████▊ | 2911468/6000000 [00:17<00:20, 152799.10it/s]
49%|████▉ | 2927001/6000000 [00:17<00:20, 153506.41it/s]
49%|████▉ | 2942979/6000000 [00:17<00:19, 155350.14it/s]
49%|████▉ | 2958964/6000000 [00:17<00:19, 156680.79it/s]
50%|████▉ | 2974813/6000000 [00:18<00:19, 157218.33it/s]
50%|████▉ | 2990780/6000000 [00:18<00:19, 157947.73it/s]
50%|█████ | 3006691/6000000 [00:18<00:18, 158292.34it/s]
50%|█████ | 3022542/6000000 [00:18<00:18, 158354.65it/s]
51%|█████ | 3038383/6000000 [00:18<00:18, 158021.47it/s]
51%|█████ | 3054189/6000000 [00:18<00:18, 156908.48it/s]
51%|█████ | 3070001/6000000 [00:18<00:18, 157107.15it/s]
51%|█████▏ | 3085715/6000000 [00:18<00:18, 156939.88it/s]
52%|█████▏ | 3101663/6000000 [00:18<00:18, 157697.04it/s]
52%|█████▏ | 3117435/6000000 [00:18<00:18, 157627.66it/s]
52%|█████▏ | 3133410/6000000 [00:19<00:18, 158261.43it/s]
52%|█████▏ | 3149238/6000000 [00:19<00:18, 157818.74it/s]
53%|█████▎ | 3165163/6000000 [00:19<00:17, 158244.37it/s]
53%|█████▎ | 3181187/6000000 [00:19<00:17, 158840.57it/s]
53%|█████▎ | 3197180/6000000 [00:19<00:17, 159166.45it/s]
54%|█████▎ | 3213098/6000000 [00:19<00:17, 159003.68it/s]
54%|█████▍ | 3228999/6000000 [00:19<00:17, 158871.17it/s]
54%|█████▍ | 3244887/6000000 [00:19<00:17, 158530.30it/s]
54%|█████▍ | 3260924/6000000 [00:19<00:17, 159080.66it/s]
55%|█████▍ | 3276833/6000000 [00:19<00:17, 158946.09it/s]
55%|█████▍ | 3292728/6000000 [00:20<00:17, 158652.15it/s]
55%|█████▌ | 3308622/6000000 [00:20<00:16, 158730.17it/s]
55%|█████▌ | 3324623/6000000 [00:20<00:16, 159104.24it/s]
56%|█████▌ | 3340534/6000000 [00:20<00:16, 158841.20it/s]
56%|█████▌ | 3356419/6000000 [00:20<00:16, 158582.65it/s]
56%|█████▌ | 3372344/6000000 [00:20<00:16, 158781.46it/s]
56%|█████▋ | 3388223/6000000 [00:20<00:16, 158547.95it/s]
57%|█████▋ | 3404078/6000000 [00:20<00:16, 158353.80it/s]
57%|█████▋ | 3420087/6000000 [00:20<00:16, 158872.04it/s]
57%|█████▋ | 3435989/6000000 [00:20<00:16, 158913.79it/s]
58%|█████▊ | 3451881/6000000 [00:21<00:16, 158597.89it/s]
58%|█████▊ | 3467742/6000000 [00:21<00:15, 158395.27it/s]
58%|█████▊ | 3483629/6000000 [00:21<00:15, 158536.70it/s]
58%|█████▊ | 3499483/6000000 [00:21<00:15, 158353.89it/s]
59%|█████▊ | 3515460/6000000 [00:21<00:15, 158775.86it/s]
59%|█████▉ | 3531338/6000000 [00:21<00:15, 158104.13it/s]
59%|█████▉ | 3547238/6000000 [00:21<00:15, 158370.29it/s]
59%|█████▉ | 3563186/6000000 [00:21<00:15, 158699.71it/s]
60%|█████▉ | 3579234/6000000 [00:21<00:15, 159231.54it/s]
60%|█████▉ | 3595217/6000000 [00:21<00:15, 159395.85it/s]
60%|██████ | 3611157/6000000 [00:22<00:15, 159169.57it/s]
60%|██████ | 3627112/6000000 [00:22<00:14, 159281.95it/s]
61%|██████ | 3643139/6000000 [00:22<00:14, 159577.71it/s]
61%|██████ | 3659097/6000000 [00:22<00:14, 158833.48it/s]
61%|██████▏ | 3675001/6000000 [00:22<00:14, 158782.00it/s]
62%|██████▏ | 3691001/6000000 [00:22<00:14, 159131.41it/s]
62%|██████▏ | 3706999/6000000 [00:22<00:14, 159384.76it/s]
62%|██████▏ | 3722938/6000000 [00:22<00:14, 158613.45it/s]
62%|██████▏ | 3738877/6000000 [00:22<00:14, 158844.28it/s]
63%|██████▎ | 3754841/6000000 [00:23<00:14, 159079.56it/s]
63%|██████▎ | 3770750/6000000 [00:23<00:14, 159049.13it/s]
63%|██████▎ | 3786656/6000000 [00:23<00:13, 158957.89it/s]
63%|██████▎ | 3802686/6000000 [00:23<00:13, 159358.78it/s]
64%|██████▎ | 3818662/6000000 [00:23<00:13, 159477.34it/s]
64%|██████▍ | 3834610/6000000 [00:23<00:13, 158837.71it/s]
64%|██████▍ | 3850495/6000000 [00:23<00:13, 158735.52it/s]
64%|██████▍ | 3866370/6000000 [00:23<00:13, 158611.05it/s]
65%|██████▍ | 3882289/6000000 [00:23<00:13, 158784.22it/s]
65%|██████▍ | 3898168/6000000 [00:23<00:13, 158387.60it/s]
65%|██████▌ | 3914008/6000000 [00:24<00:13, 157947.24it/s]
66%|██████▌ | 3930001/6000000 [00:24<00:13, 158396.51it/s]
66%|██████▌ | 3946034/6000000 [00:24<00:12, 158972.54it/s]
66%|██████▌ | 3961932/6000000 [00:24<00:12, 158844.60it/s]
66%|██████▋ | 3977817/6000000 [00:24<00:12, 158556.71it/s]
67%|██████▋ | 3993706/6000000 [00:24<00:12, 158656.12it/s]
67%|██████▋ | 4009771/6000000 [00:24<00:12, 159251.87it/s]
67%|██████▋ | 4025954/6000000 [00:24<00:12, 160024.14it/s]
67%|██████▋ | 4042102/6000000 [00:24<00:12, 160460.17it/s]
68%|██████▊ | 4058234/6000000 [00:24<00:12, 160716.60it/s]
68%|██████▊ | 4074538/6000000 [00:25<00:11, 161412.75it/s]
68%|██████▊ | 4090777/6000000 [00:25<00:11, 161703.67it/s]
68%|██████▊ | 4107023/6000000 [00:25<00:11, 161928.71it/s]
69%|██████▊ | 4123445/6000000 [00:25<00:11, 162615.43it/s]
69%|██████▉ | 4139796/6000000 [00:25<00:11, 162882.53it/s]
69%|██████▉ | 4156085/6000000 [00:25<00:11, 162548.85it/s]
70%|██████▉ | 4172351/6000000 [00:25<00:11, 162581.61it/s]
70%|██████▉ | 4188610/6000000 [00:25<00:11, 162055.25it/s]
70%|███████ | 4204910/6000000 [00:25<00:11, 162336.25it/s]
70%|███████ | 4221145/6000000 [00:25<00:10, 161891.97it/s]
71%|███████ | 4237348/6000000 [00:26<00:10, 161932.82it/s]
71%|███████ | 4253542/6000000 [00:26<00:10, 161725.11it/s]
71%|███████ | 4269820/6000000 [00:26<00:10, 162040.46it/s]
71%|███████▏ | 4286025/6000000 [00:26<00:10, 161608.26it/s]
72%|███████▏ | 4302435/6000000 [00:26<00:10, 162351.62it/s]
72%|███████▏ | 4318707/6000000 [00:26<00:10, 162461.59it/s]
72%|███████▏ | 4334954/6000000 [00:26<00:10, 162186.87it/s]
73%|███████▎ | 4351174/6000000 [00:26<00:10, 162189.69it/s]
73%|███████▎ | 4367394/6000000 [00:26<00:10, 161724.28it/s]
73%|███████▎ | 4383926/6000000 [00:26<00:09, 162797.56it/s]
73%|███████▎ | 4400207/6000000 [00:27<00:09, 162760.91it/s]
74%|███████▎ | 4416591/6000000 [00:27<00:09, 163081.33it/s]
74%|███████▍ | 4432900/6000000 [00:27<00:09, 162547.15it/s]
74%|███████▍ | 4449156/6000000 [00:27<00:09, 162087.84it/s]
74%|███████▍ | 4465560/6000000 [00:27<00:09, 162669.10it/s]
75%|███████▍ | 4481828/6000000 [00:27<00:09, 162547.10it/s]
75%|███████▍ | 4498084/6000000 [00:27<00:09, 162083.06it/s]
75%|███████▌ | 4514395/6000000 [00:27<00:09, 162388.96it/s]
76%|███████▌ | 4530782/6000000 [00:27<00:09, 162829.80it/s]
76%|███████▌ | 4547066/6000000 [00:27<00:08, 162639.12it/s]
76%|███████▌ | 4563331/6000000 [00:28<00:08, 162173.42it/s]
76%|███████▋ | 4579672/6000000 [00:28<00:08, 162540.58it/s]
77%|███████▋ | 4595927/6000000 [00:28<00:08, 162222.96it/s]
77%|███████▋ | 4612150/6000000 [00:28<00:08, 162176.54it/s]
77%|███████▋ | 4628368/6000000 [00:28<00:08, 161755.19it/s]
77%|███████▋ | 4644681/6000000 [00:28<00:08, 162163.92it/s]
78%|███████▊ | 4660898/6000000 [00:28<00:08, 162053.52it/s]
78%|███████▊ | 4677217/6000000 [00:28<00:08, 162392.94it/s]
78%|███████▊ | 4693457/6000000 [00:28<00:08, 161538.31it/s]
78%|███████▊ | 4709669/6000000 [00:28<00:07, 161711.80it/s]
79%|███████▉ | 4726001/6000000 [00:29<00:07, 162017.02it/s]
79%|███████▉ | 4742204/6000000 [00:29<00:07, 161846.81it/s]
79%|███████▉ | 4758390/6000000 [00:29<00:07, 161683.83it/s]
80%|███████▉ | 4774693/6000000 [00:29<00:07, 162084.30it/s]
80%|███████▉ | 4790987/6000000 [00:29<00:07, 162338.24it/s]
80%|████████ | 4807222/6000000 [00:29<00:07, 161984.64it/s]
80%|████████ | 4823451/6000000 [00:29<00:07, 162072.71it/s]
81%|████████ | 4839659/6000000 [00:29<00:07, 161928.80it/s]
81%|████████ | 4855935/6000000 [00:29<00:07, 162175.43it/s]
81%|████████ | 4872153/6000000 [00:29<00:06, 161835.24it/s]
81%|████████▏ | 4888407/6000000 [00:30<00:06, 162045.40it/s]
82%|████████▏ | 4904801/6000000 [00:30<00:06, 162610.90it/s]
82%|████████▏ | 4921063/6000000 [00:30<00:06, 162346.80it/s]
82%|████████▏ | 4937315/6000000 [00:30<00:06, 162396.15it/s]
83%|████████▎ | 4953555/6000000 [00:30<00:06, 162089.74it/s]
83%|████████▎ | 4969765/6000000 [00:30<00:06, 162019.67it/s]
83%|████████▎ | 4986001/6000000 [00:30<00:06, 161938.96it/s]
83%|████████▎ | 5002297/6000000 [00:30<00:06, 162242.93it/s]
84%|████████▎ | 5018587/6000000 [00:30<00:06, 162437.36it/s]
84%|████████▍ | 5034952/6000000 [00:30<00:05, 162799.30it/s]
84%|████████▍ | 5051233/6000000 [00:31<00:05, 162553.12it/s]
84%|████████▍ | 5067489/6000000 [00:31<00:05, 162201.36it/s]
85%|████████▍ | 5083749/6000000 [00:31<00:05, 162319.41it/s]
85%|████████▌ | 5100100/6000000 [00:31<00:05, 162674.00it/s]
85%|████████▌ | 5116756/6000000 [00:31<00:05, 163836.80it/s]
86%|████████▌ | 5133183/6000000 [00:31<00:05, 163966.45it/s]
86%|████████▌ | 5149580/6000000 [00:31<00:05, 162857.25it/s]
86%|████████▌ | 5165868/6000000 [00:31<00:05, 162809.29it/s]
86%|████████▋ | 5182151/6000000 [00:31<00:05, 161722.17it/s]
87%|████████▋ | 5198326/6000000 [00:31<00:04, 160942.90it/s]
87%|████████▋ | 5214498/6000000 [00:32<00:04, 161171.44it/s]
87%|████████▋ | 5230982/6000000 [00:32<00:04, 162264.93it/s]
87%|████████▋ | 5247268/6000000 [00:32<00:04, 162441.58it/s]
88%|████████▊ | 5263583/6000000 [00:32<00:04, 162651.08it/s]
88%|████████▊ | 5279939/6000000 [00:32<00:04, 162922.88it/s]
88%|████████▊ | 5296246/6000000 [00:32<00:04, 162965.60it/s]
89%|████████▊ | 5312544/6000000 [00:32<00:04, 162882.21it/s]
89%|████████▉ | 5328863/6000000 [00:32<00:04, 162972.86it/s]
89%|████████▉ | 5345197/6000000 [00:32<00:04, 163080.40it/s]
89%|████████▉ | 5361543/6000000 [00:32<00:03, 163193.21it/s]
90%|████████▉ | 5377863/6000000 [00:33<00:03, 162874.36it/s]
90%|████████▉ | 5394249/6000000 [00:33<00:03, 163166.93it/s]
90%|█████████ | 5410566/6000000 [00:33<00:03, 162653.46it/s]
90%|█████████ | 5426863/6000000 [00:33<00:03, 162746.10it/s]
91%|█████████ | 5443138/6000000 [00:33<00:03, 162496.22it/s]
91%|█████████ | 5459546/6000000 [00:33<00:03, 162969.01it/s]
91%|█████████▏| 5475964/6000000 [00:33<00:03, 163331.35it/s]
92%|█████████▏| 5492298/6000000 [00:33<00:03, 161006.77it/s]
92%|█████████▏| 5508412/6000000 [00:33<00:03, 161045.21it/s]
92%|█████████▏| 5524522/6000000 [00:33<00:02, 161059.62it/s]
92%|█████████▏| 5540657/6000000 [00:34<00:02, 161144.71it/s]
93%|█████████▎| 5557025/6000000 [00:34<00:02, 161900.28it/s]
93%|█████████▎| 5573348/6000000 [00:34<00:02, 162296.44it/s]
93%|█████████▎| 5589580/6000000 [00:34<00:02, 162061.85it/s]
93%|█████████▎| 5605788/6000000 [00:34<00:02, 162009.64it/s]
94%|█████████▎| 5622037/6000000 [00:34<00:02, 162151.67it/s]
94%|█████████▍| 5638253/6000000 [00:34<00:02, 161916.30it/s]
94%|█████████▍| 5654510/6000000 [00:34<00:02, 162109.88it/s]
95%|█████████▍| 5670914/6000000 [00:34<00:02, 162686.30it/s]
95%|█████████▍| 5687251/6000000 [00:34<00:01, 162890.44it/s]
95%|█████████▌| 5703566/6000000 [00:35<00:01, 162966.55it/s]
95%|█████████▌| 5720001/6000000 [00:35<00:01, 163362.98it/s]
96%|█████████▌| 5736454/6000000 [00:35<00:01, 163709.59it/s]
96%|█████████▌| 5752826/6000000 [00:35<00:01, 163275.77it/s]
96%|█████████▌| 5769154/6000000 [00:35<00:01, 163264.62it/s]
96%|█████████▋| 5785640/6000000 [00:35<00:01, 163739.75it/s]
97%|█████████▋| 5802081/6000000 [00:35<00:01, 163940.19it/s]
97%|█████████▋| 5818662/6000000 [00:35<00:01, 164500.28it/s]
97%|█████████▋| 5835113/6000000 [00:35<00:01, 163970.72it/s]
98%|█████████▊| 5851511/6000000 [00:35<00:00, 163696.08it/s]
98%|█████████▊| 5867961/6000000 [00:36<00:00, 163935.76it/s]
98%|█████████▊| 5884355/6000000 [00:36<00:00, 163793.63it/s]
98%|█████████▊| 5900781/6000000 [00:36<00:00, 163931.42it/s]
99%|█████████▊| 5917175/6000000 [00:36<00:00, 163451.22it/s]
99%|█████████▉| 5933610/6000000 [00:36<00:00, 163718.55it/s]
99%|█████████▉| 5949983/6000000 [00:36<00:00, 163507.07it/s]
99%|█████████▉| 5966334/6000000 [00:36<00:00, 163203.37it/s]
100%|█████████▉| 5982655/6000000 [00:36<00:00, 162830.35it/s]
100%|█████████▉| 5998981/6000000 [00:36<00:00, 162958.70it/s]
100%|██████████| 6000000/6000000 [00:36<00:00, 162798.02it/s]
colors = {
'ε = 0.1': 'tab:blue',
'ε = 0.01': 'tab:red',
'ε = 0': 'tab:green'
}
(
trace_df
.groupby(['step', 'policy'])
['is_action_optimal'].mean()
.unstack()
.plot()
)
[1m<[0m[1;95mAxes:[0m[39m [0m[33mxlabel[0m[39m=[0m[32m'step'[0m[1m>[0m

Handling drift¶
The environment used above is a toy situation used for introducing bandits. It is stationary, meaning that the expected reward of each arm does not change over time.
In practice, arms are dynamic, and their performance can vary over time. A simple example of this is the Candy Cane Contest that was hosted on Kaggle in 2020. The expected reward of each arm diminishes each time it is pulled.
The way bandit policies in River deal with drift depends on the method. For the bandit.EpsilonGreedy policy, it makes sense to use a rolling average as the reward object. What this means is that the empirical reward the policy calculates for each arm is a rolling average, rather than a global one.
from river import proba, utils
policies=[
bandit.EpsilonGreedy(
epsilon=0.1,
seed=42
),
bandit.EpsilonGreedy(
epsilon=0.3,
reward_obj=utils.Rolling(stats.Mean(), window_size=50),
seed=42
),
bandit.ThompsonSampling(
reward_obj=proba.Beta(),
seed=42
)
]
env = gym.make('river_bandits/CandyCaneContest-v0')
trace = bandit.evaluate(
policies=policies,
env=env,
n_episodes=(n_episodes := 30),
seed=42
)
trace_df = pd.DataFrame(tqdm(
trace, position=0, total=(
n_episodes *
len(policies) *
env._max_episode_steps
)
))
0%| | 0/180000 [00:00<?, ?it/s]
1%| | 1266/180000 [00:00<00:14, 12639.73it/s]
1%|▏ | 2570/180000 [00:00<00:13, 12874.73it/s]
2%|▏ | 3861/180000 [00:00<00:13, 12884.86it/s]
3%|▎ | 5150/180000 [00:00<00:13, 12873.28it/s]
4%|▎ | 6438/180000 [00:00<00:13, 12734.80it/s]
4%|▍ | 7749/180000 [00:00<00:13, 12857.61it/s]
5%|▌ | 9036/180000 [00:00<00:13, 12769.75it/s]
6%|▌ | 10317/180000 [00:00<00:13, 12776.04it/s]
6%|▋ | 11613/180000 [00:00<00:13, 12830.60it/s]
7%|▋ | 12897/180000 [00:01<00:13, 12744.36it/s]
8%|▊ | 14172/180000 [00:01<00:13, 12695.25it/s]
9%|▊ | 15463/180000 [00:01<00:12, 12759.06it/s]
9%|▉ | 16740/180000 [00:01<00:12, 12587.72it/s]
10%|█ | 18001/180000 [00:01<00:12, 12525.91it/s]
11%|█ | 19254/180000 [00:01<00:12, 12480.13it/s]
11%|█▏ | 20562/180000 [00:01<00:12, 12655.21it/s]
12%|█▏ | 21864/180000 [00:01<00:12, 12760.56it/s]
13%|█▎ | 23147/180000 [00:01<00:12, 12780.92it/s]
14%|█▎ | 24426/180000 [00:01<00:12, 12511.23it/s]
14%|█▍ | 25679/180000 [00:02<00:12, 12513.36it/s]
15%|█▍ | 26932/180000 [00:02<00:12, 12352.55it/s]
16%|█▌ | 28176/180000 [00:02<00:12, 12376.77it/s]
16%|█▋ | 29462/180000 [00:02<00:12, 12519.66it/s]
17%|█▋ | 30715/180000 [00:02<00:11, 12519.57it/s]
18%|█▊ | 32015/180000 [00:02<00:11, 12662.54it/s]
18%|█▊ | 33282/180000 [00:02<00:11, 12489.58it/s]
19%|█▉ | 34560/180000 [00:02<00:11, 12572.60it/s]
20%|█▉ | 35835/180000 [00:02<00:11, 12619.25it/s]
21%|██ | 37098/180000 [00:02<00:11, 12514.31it/s]
21%|██▏ | 38359/180000 [00:03<00:11, 12542.14it/s]
22%|██▏ | 39663/180000 [00:03<00:11, 12690.33it/s]
23%|██▎ | 40971/180000 [00:03<00:10, 12804.27it/s]
23%|██▎ | 42252/180000 [00:03<00:10, 12657.28it/s]
24%|██▍ | 43572/180000 [00:03<00:10, 12814.86it/s]
25%|██▍ | 44869/180000 [00:03<00:10, 12860.59it/s]
26%|██▌ | 46164/180000 [00:03<00:10, 12884.76it/s]
26%|██▋ | 47457/180000 [00:03<00:10, 12895.89it/s]
27%|██▋ | 48747/180000 [00:03<00:10, 12622.43it/s]
28%|██▊ | 50011/180000 [00:03<00:10, 12600.08it/s]
28%|██▊ | 51292/180000 [00:04<00:10, 12661.14it/s]
29%|██▉ | 52581/180000 [00:04<00:10, 12728.15it/s]
30%|██▉ | 53867/180000 [00:04<00:09, 12767.35it/s]
31%|███ | 55145/180000 [00:04<00:09, 12602.79it/s]
31%|███▏ | 56418/180000 [00:04<00:09, 12637.35it/s]
32%|███▏ | 57699/180000 [00:04<00:09, 12686.87it/s]
33%|███▎ | 58985/180000 [00:04<00:09, 12737.89it/s]
33%|███▎ | 60260/180000 [00:04<00:09, 12520.11it/s]
34%|███▍ | 61557/180000 [00:04<00:09, 12649.75it/s]
35%|███▍ | 62852/180000 [00:04<00:09, 12738.61it/s]
36%|███▌ | 64127/180000 [00:05<00:09, 12577.90it/s]
36%|███▋ | 65386/180000 [00:05<00:09, 12377.12it/s]
37%|███▋ | 66625/180000 [00:05<00:09, 12159.81it/s]
38%|███▊ | 67911/180000 [00:05<00:09, 12359.60it/s]
38%|███▊ | 69198/180000 [00:05<00:08, 12508.89it/s]
39%|███▉ | 70494/180000 [00:05<00:08, 12636.71it/s]
40%|███▉ | 71759/180000 [00:05<00:08, 12485.26it/s]
41%|████ | 73009/180000 [00:05<00:08, 12414.68it/s]
41%|████▏ | 74280/180000 [00:05<00:08, 12496.60it/s]
42%|████▏ | 75561/180000 [00:05<00:08, 12584.49it/s]
43%|████▎ | 76823/180000 [00:06<00:08, 12594.79it/s]
43%|████▎ | 78083/180000 [00:06<00:08, 12402.32it/s]
44%|████▍ | 79420/180000 [00:06<00:07, 12688.05it/s]
45%|████▍ | 80695/180000 [00:06<00:07, 12706.27it/s]
46%|████▌ | 81981/180000 [00:06<00:07, 12746.43it/s]
46%|████▋ | 83259/180000 [00:06<00:07, 12753.81it/s]
47%|████▋ | 84535/180000 [00:06<00:07, 12555.16it/s]
48%|████▊ | 85792/180000 [00:06<00:07, 12452.73it/s]
48%|████▊ | 87038/180000 [00:06<00:07, 12315.21it/s]
49%|████▉ | 88271/180000 [00:07<00:07, 12306.49it/s]
50%|████▉ | 89503/180000 [00:07<00:07, 12218.17it/s]
50%|█████ | 90726/180000 [00:07<00:07, 12114.96it/s]
51%|█████ | 92011/180000 [00:07<00:07, 12331.22it/s]
52%|█████▏ | 93303/180000 [00:07<00:06, 12504.58it/s]
53%|█████▎ | 94608/180000 [00:07<00:06, 12662.22it/s]
53%|█████▎ | 95910/180000 [00:07<00:06, 12768.40it/s]
54%|█████▍ | 97188/180000 [00:07<00:06, 12641.33it/s]
55%|█████▍ | 98469/180000 [00:07<00:06, 12687.31it/s]
55%|█████▌ | 99753/180000 [00:07<00:06, 12726.64it/s]
56%|█████▌ | 101026/180000 [00:08<00:06, 12709.21it/s]
57%|█████▋ | 102298/180000 [00:08<00:06, 12512.16it/s]
58%|█████▊ | 103593/180000 [00:08<00:06, 12640.18it/s]
58%|█████▊ | 104871/180000 [00:08<00:05, 12678.96it/s]
59%|█████▉ | 106143/180000 [00:08<00:05, 12685.51it/s]
60%|█████▉ | 107427/180000 [00:08<00:05, 12725.80it/s]
60%|██████ | 108700/180000 [00:08<00:05, 12581.52it/s]
61%|██████ | 109977/180000 [00:08<00:05, 12635.79it/s]
62%|██████▏ | 111255/180000 [00:08<00:05, 12674.12it/s]
63%|██████▎ | 112554/180000 [00:08<00:05, 12764.21it/s]
63%|██████▎ | 113860/180000 [00:09<00:05, 12851.53it/s]
64%|██████▍ | 115146/180000 [00:09<00:05, 12748.35it/s]
65%|██████▍ | 116437/180000 [00:09<00:04, 12796.05it/s]
65%|██████▌ | 117732/180000 [00:09<00:04, 12836.03it/s]
66%|██████▌ | 119038/180000 [00:09<00:04, 12902.83it/s]
67%|██████▋ | 120329/180000 [00:09<00:04, 12556.70it/s]
68%|██████▊ | 121635/180000 [00:09<00:04, 12703.22it/s]
68%|██████▊ | 122937/180000 [00:09<00:04, 12793.53it/s]
69%|██████▉ | 124239/180000 [00:09<00:04, 12854.69it/s]
70%|██████▉ | 125540/180000 [00:09<00:04, 12899.99it/s]
70%|███████ | 126831/180000 [00:10<00:04, 12557.58it/s]
71%|███████ | 128090/180000 [00:10<00:04, 12535.35it/s]
72%|███████▏ | 129365/180000 [00:10<00:04, 12597.55it/s]
73%|███████▎ | 130639/180000 [00:10<00:03, 12638.40it/s]
73%|███████▎ | 131904/180000 [00:10<00:03, 12629.31it/s]
74%|███████▍ | 133168/180000 [00:10<00:03, 12565.46it/s]
75%|███████▍ | 134448/180000 [00:10<00:03, 12631.92it/s]
75%|███████▌ | 135735/180000 [00:10<00:03, 12696.63it/s]
76%|███████▌ | 137005/180000 [00:10<00:03, 12589.19it/s]
77%|███████▋ | 138265/180000 [00:10<00:03, 12465.14it/s]
78%|███████▊ | 139560/180000 [00:11<00:03, 12604.13it/s]
78%|███████▊ | 140856/180000 [00:11<00:03, 12705.17it/s]
79%|███████▉ | 142127/180000 [00:11<00:02, 12650.98it/s]
80%|███████▉ | 143394/180000 [00:11<00:02, 12655.77it/s]
80%|████████ | 144660/180000 [00:11<00:02, 12417.81it/s]
81%|████████ | 145953/180000 [00:11<00:02, 12565.64it/s]
82%|████████▏ | 147251/180000 [00:11<00:02, 12687.31it/s]
83%|████████▎ | 148566/180000 [00:11<00:02, 12818.86it/s]
83%|████████▎ | 149871/180000 [00:11<00:02, 12884.15it/s]
84%|████████▍ | 151160/180000 [00:11<00:02, 12782.11it/s]
85%|████████▍ | 152454/180000 [00:12<00:02, 12826.53it/s]
85%|████████▌ | 153738/180000 [00:12<00:02, 12673.63it/s]
86%|████████▌ | 155028/180000 [00:12<00:01, 12739.61it/s]
87%|████████▋ | 156303/180000 [00:12<00:01, 12568.13it/s]
88%|████████▊ | 157617/180000 [00:12<00:01, 12735.18it/s]
88%|████████▊ | 158892/180000 [00:12<00:01, 12706.77it/s]
89%|████████▉ | 160164/180000 [00:12<00:01, 12704.94it/s]
90%|████████▉ | 161435/180000 [00:12<00:01, 12558.23it/s]
90%|█████████ | 162692/180000 [00:12<00:01, 12480.19it/s]
91%|█████████ | 163956/180000 [00:12<00:01, 12527.02it/s]
92%|█████████▏| 165249/180000 [00:13<00:01, 12641.55it/s]
93%|█████████▎| 166527/180000 [00:13<00:01, 12680.79it/s]
93%|█████████▎| 167815/180000 [00:13<00:00, 12740.14it/s]
94%|█████████▍| 169090/180000 [00:13<00:00, 12678.82it/s]
95%|█████████▍| 170383/180000 [00:13<00:00, 12753.32it/s]
95%|█████████▌| 171663/180000 [00:13<00:00, 12761.39it/s]
96%|█████████▌| 172953/180000 [00:13<00:00, 12802.21it/s]
97%|█████████▋| 174234/180000 [00:13<00:00, 12589.53it/s]
98%|█████████▊| 175539/180000 [00:13<00:00, 12716.74it/s]
98%|█████████▊| 176837/180000 [00:13<00:00, 12794.05it/s]
99%|█████████▉| 178146/180000 [00:14<00:00, 12876.69it/s]
100%|█████████▉| 179457/180000 [00:14<00:00, 12940.38it/s]
100%|██████████| 180000/180000 [00:14<00:00, 12647.84it/s]
We can compare the performance of each policy by checking the average reward at the end of each episode.
(
trace_df
.groupby(['policy_idx', 'episode'])
.last()
.groupby('policy_idx')
.reward_stat.mean()
)
policy_idx
[1;36m0[0m [1;36m736.1[0m
[1;36m1[0m [1;36m817.0[0m
[1;36m2[0m [1;36m854.0[0m
Name: reward_stat, dtype: float64
We see that using a rolling average gives a boost to the epsilon greedy strategy. However, we see that the bandit.ThompsonSampling policy performs even better, even though no particular care was given to drift. A natural next step would thus be to see how it could be improved to handle drift. For instance, its dist parameter could be wrapped with a utils.Rolling:
policy = bandit.ThompsonSampling(
reward_obj=utils.Rolling(proba.Beta(), window_size=50),
seed=42
)
Bandits can be used for several tasks. They can be used for content personalization, as well as online model selection (see model_selection.BanditRegressor). The policies in River are therefore designed to be flexible, so that they can be used in conjunction with other River modules. For instance, the reward_obj in bandit.EpsilonGreedy can be a metric, a probability distribution, or a statistic. This works because objects in River adher to a coherent get/update interface.