thompson’s documentation!

|Python |Python Version Sphinx documentation Stars lines of code Downloads per month Downloads in total License Github Forks Open Issues Project Status Colab example Cite repo-size donate

thompson is a Python package that implements three algorithms for solving the multi-armed bandit problem:

  1. Thompson Sampling: A Bayesian approach that maintains probability distributions over the expected rewards of each arm and samples from these distributions to select the next arm to pull.

  2. Upper Confidence Bound (UCB): A deterministic algorithm that selects arms based on their estimated rewards and the uncertainty in those estimates.

  3. Randomized Sampling: A baseline method that randomly selects arms without considering their past performance.

The multi-armed bandit problem is a classic reinforcement learning problem that exemplifies the exploration-exploitation tradeoff dilemma. In this problem, a fixed limited set of resources must be allocated between competing choices in a way that maximizes expected gain, when each choice’s properties are only partially known at the time of allocation.


Note

Your ❤️ is important to keep maintaining this package. You can support in various ways, have a look at the sponsor page. Report bugs, issues and feature extensions at github page.

pip install thompson

Features

  • Thompson Sampling implementation with Bayesian updates

  • Upper Confidence Bound (UCB) algorithm with confidence intervals

  • Randomized sampling baseline for comparison

  • Visualization tools for results analysis

  • Example datasets included

  • Comprehensive logging system

  • Detailed documentation and examples

Content

Background

Tutorials

Examples

Indices and tables