Examples

This page provides examples of using the thompson package to solve multi-armed bandit problems.

Thompson Sampling Example

In this example, we use Thompson Sampling to determine which advertisement performs best. We load a dataset containing 10,000 samples and 10 different ads, where each sample represents a user’s response (1 for click, 0 for no click).

# Import library
import thompson as th

# Load example data
df = th.import_example()

# Apply Thompson sampling
results = th.thompson(df)

# Plot results
fig = th.plot(results)

The plot shows: - The log rewards for each ad (left panel) - The selection pattern over time (right panel)

Thompson Sampling Results

Upper Confidence Bound (UCB) Example

In this example, we use the UCB algorithm to solve the same advertisement optimization problem. The UCB algorithm balances exploration and exploitation using confidence bounds.

# Import library
import thompson as th

# Load example data
df = th.import_example()

# Apply UCB algorithm
results = th.UCB(df)

# Plot results
fig = th.plot(results)

The plot shows: - The log number of selections and rewards for each ad (left panel) - The selection pattern over time (right panel)

UCB Algorithm Results

Randomized Sampling Example

In this example, we use randomized sampling as a baseline for comparison. This method randomly selects ads without considering their past performance.

# Import library
import thompson as th

# Load example data
df = th.import_example()

# Apply randomized sampling
results = th.UCB_random(df)

# Plot results
fig = th.plot(results)

The plot shows: - The distribution of selections across ads (left panel) - The random selection pattern over time (right panel)

Randomized Sampling Results

Comparing Methods

To compare the performance of different methods, you can run all three algorithms and compare their total rewards:

# Import library
import thompson as th

# Load example data
df = th.import_example()

# Run all methods
results_ts = th.thompson(df)
results_ucb = th.UCB(df)
results_rand = th.UCB_random(df)

# Compare total rewards
print(f"Thompson Sampling total reward: {results_ts['total_reward']}")
print(f"UCB total reward: {results_ucb['total_reward']}")
print(f"Randomized total reward: {results_rand['total_reward']}")

This comparison helps understand the relative performance of each method in terms of total rewards obtained.