Tutorials

This page provides tutorials for using the thompson package to solve multi-armed bandit problems.

Basic Usage

The input for thompson is a pd.DataFrame with rows as samples and columns as features (arms). For demonstration purposes, we will load a dataset with ads, containing 10000 samples and 10 ads.

# Import library
import thompson as th

# Load example data
df = th.import_example()

# Show data
print(df)

#             Ad 1  Ad 2  Ad 3  Ad 4  Ad 5  Ad 6  Ad 7  Ad 8  Ad 9  Ad 10
#       0        1     0     0     0     1     0     0     0     1      0
#       1        0     0     0     0     0     0     0     0     1      0
#       2        0     0     0     0     0     0     0     0     0      0
#       3        0     1     0     0     0     0     0     1     0      0
#       4        0     0     0     0     0     0     0     0     0      0
#          ...   ...   ...   ...   ...   ...   ...   ...   ...    ...
#       9995     0     0     1     0     0     0     0     1     0      0
#       9996     0     0     0     0     0     0     0     0     0      0
#       9997     0     0     0     0     0     0     0     0     0      0
#       9998     1     0     0     0     0     0     0     1     0      0
#       9999     0     1     0     0     0     0     0     0     0      0

Thompson Sampling

Thompson Sampling is a Bayesian approach to the multi-armed bandit problem. Here’s how to use it:

# Apply Thompson sampling
results = th.thompson(df)

# Plot the results
th.plot(results)

The output of thompson thompson.thompson() is a dictionary with the following keys:

  • columns: Names of the columns (arms)

  • total_reward: Total rewards obtained

  • cols_selected: Vector describing which arm was selected for each trial

  • cols_rewards_0: Number of unsuccessful trials per arm

  • cols_rewards_1: Number of successful trials per arm

  • methodtype: Method that was used (‘thompson’)

Upper Confidence Bound (UCB)

The UCB algorithm balances exploration and exploitation using confidence bounds:

# Apply UCB algorithm
results = th.UCB(df)

# Plot the results
th.plot(results)

The output of UCB thompson.UCB() includes:

  • columns: Names of the columns (arms)

  • total_reward: Total rewards obtained

  • cols_selected: Vector describing which arm was selected for each trial

  • sum_rewards: Sum of rewards obtained per arm

  • num_selections: Number of times each arm was selected

  • methodtype: Method that was used (‘UCB’)

Randomized Sampling

For comparison, you can use randomized sampling as a baseline:

# Apply randomized sampling
results = th.UCB_random(df)

# Plot the results
th.plot(results)

The output of UCB_random thompson.UCB_random() includes:

  • columns: Names of the columns (arms)

  • total_reward: Total rewards obtained

  • cols_selected: Vector describing which arm was selected for each trial

  • methodtype: Method that was used (‘UCB_random’)

Visualization

All three methods support visualization of results:

# Plot results with custom figure size
th.plot(results, width=15, height=10)

The plot function creates visualizations showing: - The performance of each arm - The selection pattern over time - Comparison between different methods