thompson’s documentation!
thompson
is a Python package that implements three algorithms for solving the multi-armed bandit problem:
Thompson Sampling: A Bayesian approach that maintains probability distributions over the expected rewards of each arm and samples from these distributions to select the next arm to pull.
Upper Confidence Bound (UCB): A deterministic algorithm that selects arms based on their estimated rewards and the uncertainty in those estimates.
Randomized Sampling: A baseline method that randomly selects arms without considering their past performance.
The multi-armed bandit problem is a classic reinforcement learning problem that exemplifies the exploration-exploitation tradeoff dilemma. In this problem, a fixed limited set of resources must be allocated between competing choices in a way that maximizes expected gain, when each choice’s properties are only partially known at the time of allocation.
Note
Your ❤️ is important to keep maintaining this package. You can support in various ways, have a look at the sponsor page. Report bugs, issues and feature extensions at github page.
pip install thompson
Features
Thompson Sampling implementation with Bayesian updates
Upper Confidence Bound (UCB) algorithm with confidence intervals
Randomized sampling baseline for comparison
Visualization tools for results analysis
Example datasets included
Comprehensive logging system
Detailed documentation and examples
Content
Background
Installation
Tutorials
Examples