Examples
This page provides examples of using the thompson package to solve multi-armed bandit problems.
Thompson Sampling Example
In this example, we use Thompson Sampling to determine which advertisement performs best. We load a dataset containing 10,000 samples and 10 different ads, where each sample represents a user’s response (1 for click, 0 for no click).
# Import library
import thompson as th
# Load example data
df = th.import_example()
# Apply Thompson sampling
results = th.thompson(df)
# Plot results
fig = th.plot(results)
The plot shows: - The log rewards for each ad (left panel) - The selection pattern over time (right panel)
|
Upper Confidence Bound (UCB) Example
In this example, we use the UCB algorithm to solve the same advertisement optimization problem. The UCB algorithm balances exploration and exploitation using confidence bounds.
# Import library
import thompson as th
# Load example data
df = th.import_example()
# Apply UCB algorithm
results = th.UCB(df)
# Plot results
fig = th.plot(results)
The plot shows: - The log number of selections and rewards for each ad (left panel) - The selection pattern over time (right panel)
|
Randomized Sampling Example
In this example, we use randomized sampling as a baseline for comparison. This method randomly selects ads without considering their past performance.
# Import library
import thompson as th
# Load example data
df = th.import_example()
# Apply randomized sampling
results = th.UCB_random(df)
# Plot results
fig = th.plot(results)
The plot shows: - The distribution of selections across ads (left panel) - The random selection pattern over time (right panel)
|
Comparing Methods
To compare the performance of different methods, you can run all three algorithms and compare their total rewards:
# Import library
import thompson as th
# Load example data
df = th.import_example()
# Run all methods
results_ts = th.thompson(df)
results_ucb = th.UCB(df)
results_rand = th.UCB_random(df)
# Compare total rewards
print(f"Thompson Sampling total reward: {results_ts['total_reward']}")
print(f"UCB total reward: {results_ucb['total_reward']}")
print(f"Randomized total reward: {results_rand['total_reward']}")
This comparison helps understand the relative performance of each method in terms of total rewards obtained.


