API References
This module implements multi-armed bandit algorithms including Thompson Sampling, UCB (Upper Confidence Bound), and randomized sampling.
- thompson.thompson.UCB(df, verbose='info')
Perform Upper Confidence Bound (UCB) algorithm on the multi-armed bandit problem.
UCB is a deterministic algorithm that selects arms based on their estimated rewards and the uncertainty in those estimates. It balances exploration and exploitation by selecting arms with high upper confidence bounds.
- Parameters:
df (pd.DataFrame) – Contains samples[rows] x features[columns]. Each row represents a trial, and each column represents an arm of the bandit. Values should be 0 or 1, where 1 indicates a successful trial.
verbose (str or int, optional) – Set the verbose messages using string or integer values: * [0, 60, None, ‘silent’, ‘off’, ‘no’]: No message * [10, ‘debug’]: Messages from debug level and higher * [20, ‘info’]: Messages from info level and higher * [30, ‘warning’]: Messages from warning level and higher * [50, ‘critical’, ‘error’]: Messages from critical level and higher Default is ‘info’
- Returns:
Dictionary containing: - columns: Names of the columns (arms) - total_reward: Total rewards obtained - cols_selected: Vector describing which arm was selected for each trial - sum_rewards: Sum of rewards obtained per arm - num_selections: Number of times each arm was selected - methodtype: ‘UCB’
- Return type:
dict
Examples
>>> import thompson as th >>> df = th.import_example() >>> out = th.UCB(df) >>> print(f"Total reward: {out['total_reward']}") >>> print(f"Most selected arm: {out['columns'][np.argmax(out['num_selections'])]}")
- thompson.thompson.UCB_random(df, verbose='info')
Perform randomized sampling on the multi-armed bandit problem.
This method randomly selects arms without considering their past performance. It serves as a baseline for comparing the performance of more sophisticated algorithms like Thompson sampling and UCB.
- Parameters:
df (pd.DataFrame) – Contains samples[rows] x features[columns]. Each row represents a trial, and each column represents an arm of the bandit. Values should be 0 or 1, where 1 indicates a successful trial.
verbose (str or int, optional) – Set the verbose messages using string or integer values: * [0, 60, None, ‘silent’, ‘off’, ‘no’]: No message * [10, ‘debug’]: Messages from debug level and higher * [20, ‘info’]: Messages from info level and higher * [30, ‘warning’]: Messages from warning level and higher * [50, ‘critical’, ‘error’]: Messages from critical level and higher Default is ‘info’
- Returns:
Dictionary containing: - columns: Names of the columns (arms) - total_reward: Total rewards obtained - cols_selected: Vector describing which arm was selected for each trial - methodtype: ‘UCB_random’
- Return type:
dict
Examples
>>> import thompson as th >>> df = th.import_example() >>> out = th.UCB_random(df) >>> print(f"Total reward: {out['total_reward']}") >>> print(f"Number of trials: {len(out['cols_selected'])}")
- thompson.thompson.check_logger(verbose: [<class 'str'>, <class 'int'>] = 'info')
Check the logger.
- thompson.thompson.convert_verbose_to_new(verbose)
Convert old verbosity to the new.
- thompson.thompson.disable_tqdm()
Set the logger for verbosity messages.
- thompson.thompson.get_logger()
- thompson.thompson.import_example(data='ads', url=None, sep=',', verbose='info')
Import example dataset from github source.
Import one of the few datasets from github source or specify your own download url link.
- Parameters:
data (str) – Name of datasets: ‘ads’
url (str) – url link to to dataset.
verbose ([str, int], default is 'info' or 20) – Set the verbose messages using string or integer values. * [0, 60, None, ‘silent’, ‘off’, ‘no’]: No message. * [10, ‘debug’]: Messages from debug level and higher. * [20, ‘info’]: Messages from info level and higher. * [30, ‘warning’]: Messages from warning level and higher. * [50, ‘critical’, ‘error’]: Messages from critical level and higher.
- Returns:
Dataset containing mixed features.
- Return type:
pd.DataFrame()
- thompson.thompson.makefig_UCB(out, width=15, height=10)
- thompson.thompson.makefig_UCB_random(out, width=15, height=10)
- thompson.thompson.makefig_thompson(out, width=15, height=10)
- thompson.thompson.plot(out, width=15, height=10, verbose='info')
Plot the results of the multi-armed bandit algorithm.
Creates visualizations showing the performance of the selected algorithm. The type of plot depends on the method used (Thompson, UCB, or randomized).
- Parameters:
out (dict) – Output from thompson, UCB, or UCB_random containing the results to plot.
width (int, optional) – Width of the figure in inches. Default is 15.
height (int, optional) – Height of the figure in inches. Default is 10.
verbose (str or int, optional) – Set the verbose messages using string or integer values: * [0, 60, None, ‘silent’, ‘off’, ‘no’]: No message * [10, ‘debug’]: Messages from debug level and higher * [20, ‘info’]: Messages from info level and higher * [30, ‘warning’]: Messages from warning level and higher * [50, ‘critical’, ‘error’]: Messages from critical level and higher Default is ‘info’
- Returns:
The function displays the plot directly and returns None.
- Return type:
None
Examples
>>> import thompson as th >>> df = th.import_example() >>> # Plot Thompson sampling results >>> out_tps = th.thompson(df) >>> th.plot(out_tps) >>> # Plot UCB results >>> out_ucb = th.UCB(df) >>> th.plot(out_ucb) >>> # Plot randomized results >>> out_ran = th.UCB_random(df) >>> th.plot(out_ran)
- thompson.thompson.set_logger(verbose: [<class 'str'>, <class 'int'>] = 'info')
Set the logger for verbosity messages.
- Parameters:
verbose ([str, int], default is 'info' or 20) – Set the verbose messages using string or integer values. * [0, 60, None, ‘silent’, ‘off’, ‘no’]: No message. * [10, ‘debug’]: Messages from debug level and higher. * [20, ‘info’]: Messages from info level and higher. * [30, ‘warning’]: Messages from warning level and higher. * [50, ‘critical’]: Messages from critical level and higher.
- Returns:
None.
> # Set the logger to warning
> set_logger(verbose=’warning’)
> # Test with different messages
> logger.debug(“Hello debug”)
> logger.info(“Hello info”)
> logger.warning(“Hello warning”)
> logger.critical(“Hello critical”)
- thompson.thompson.thompson(df, verbose='info')
Perform Thompson sampling on the multi-armed bandit problem.
Thompson sampling is a Bayesian approach to the multi-armed bandit problem. It maintains a probability distribution over the expected rewards of each arm and samples from these distributions to select the next arm to pull.
- Parameters:
df (pd.DataFrame) – Contains samples[rows] x features[columns]. Each row represents a trial, and each column represents an arm of the bandit. Values should be 0 or 1, where 1 indicates a successful trial.
verbose (str or int, optional) – Set the verbose messages using string or integer values: * [0, 60, None, ‘silent’, ‘off’, ‘no’]: No message * [10, ‘debug’]: Messages from debug level and higher * [20, ‘info’]: Messages from info level and higher * [30, ‘warning’]: Messages from warning level and higher * [50, ‘critical’, ‘error’]: Messages from critical level and higher Default is ‘info’
- Returns:
Dictionary containing: - columns: Names of the columns (arms) - total_reward: Total rewards obtained - cols_selected: Vector describing which arm was selected for each trial - cols_rewards_1: Number of successful trials per arm - cols_rewards_0: Number of unsuccessful trials per arm - methodtype: ‘thompson’
- Return type:
dict
Examples
>>> import thompson as th >>> df = th.import_example() >>> out = th.thompson(df) >>> print(f"Total reward: {out['total_reward']}") >>> print(f"Best performing arm: {out['columns'][np.argmax(out['cols_rewards_1'])]}")
- thompson.thompson.wget(url, writepath)