API References

This module implements multi-armed bandit algorithms including Thompson Sampling, UCB (Upper Confidence Bound), and randomized sampling.

thompson.thompson.UCB(df, verbose='info')

Perform Upper Confidence Bound (UCB) algorithm on the multi-armed bandit problem.

UCB is a deterministic algorithm that selects arms based on their estimated rewards and the uncertainty in those estimates. It balances exploration and exploitation by selecting arms with high upper confidence bounds.

Parameters:
  • df (pd.DataFrame) – Contains samples[rows] x features[columns]. Each row represents a trial, and each column represents an arm of the bandit. Values should be 0 or 1, where 1 indicates a successful trial.

  • verbose (str or int, optional) – Set the verbose messages using string or integer values: * [0, 60, None, ‘silent’, ‘off’, ‘no’]: No message * [10, ‘debug’]: Messages from debug level and higher * [20, ‘info’]: Messages from info level and higher * [30, ‘warning’]: Messages from warning level and higher * [50, ‘critical’, ‘error’]: Messages from critical level and higher Default is ‘info’

Returns:

Dictionary containing: - columns: Names of the columns (arms) - total_reward: Total rewards obtained - cols_selected: Vector describing which arm was selected for each trial - sum_rewards: Sum of rewards obtained per arm - num_selections: Number of times each arm was selected - methodtype: ‘UCB’

Return type:

dict

Examples

>>> import thompson as th
>>> df = th.import_example()
>>> out = th.UCB(df)
>>> print(f"Total reward: {out['total_reward']}")
>>> print(f"Most selected arm: {out['columns'][np.argmax(out['num_selections'])]}")
thompson.thompson.UCB_random(df, verbose='info')

Perform randomized sampling on the multi-armed bandit problem.

This method randomly selects arms without considering their past performance. It serves as a baseline for comparing the performance of more sophisticated algorithms like Thompson sampling and UCB.

Parameters:
  • df (pd.DataFrame) – Contains samples[rows] x features[columns]. Each row represents a trial, and each column represents an arm of the bandit. Values should be 0 or 1, where 1 indicates a successful trial.

  • verbose (str or int, optional) – Set the verbose messages using string or integer values: * [0, 60, None, ‘silent’, ‘off’, ‘no’]: No message * [10, ‘debug’]: Messages from debug level and higher * [20, ‘info’]: Messages from info level and higher * [30, ‘warning’]: Messages from warning level and higher * [50, ‘critical’, ‘error’]: Messages from critical level and higher Default is ‘info’

Returns:

Dictionary containing: - columns: Names of the columns (arms) - total_reward: Total rewards obtained - cols_selected: Vector describing which arm was selected for each trial - methodtype: ‘UCB_random’

Return type:

dict

Examples

>>> import thompson as th
>>> df = th.import_example()
>>> out = th.UCB_random(df)
>>> print(f"Total reward: {out['total_reward']}")
>>> print(f"Number of trials: {len(out['cols_selected'])}")
thompson.thompson.check_logger(verbose: [<class 'str'>, <class 'int'>] = 'info')

Check the logger.

thompson.thompson.convert_verbose_to_new(verbose)

Convert old verbosity to the new.

thompson.thompson.disable_tqdm()

Set the logger for verbosity messages.

thompson.thompson.get_logger()
thompson.thompson.import_example(data='ads', url=None, sep=',', verbose='info')

Import example dataset from github source.

Import one of the few datasets from github source or specify your own download url link.

Parameters:
  • data (str) – Name of datasets: ‘ads’

  • url (str) – url link to to dataset.

  • verbose ([str, int], default is 'info' or 20) – Set the verbose messages using string or integer values. * [0, 60, None, ‘silent’, ‘off’, ‘no’]: No message. * [10, ‘debug’]: Messages from debug level and higher. * [20, ‘info’]: Messages from info level and higher. * [30, ‘warning’]: Messages from warning level and higher. * [50, ‘critical’, ‘error’]: Messages from critical level and higher.

Returns:

Dataset containing mixed features.

Return type:

pd.DataFrame()

thompson.thompson.makefig_UCB(out, width=15, height=10)
thompson.thompson.makefig_UCB_random(out, width=15, height=10)
thompson.thompson.makefig_thompson(out, width=15, height=10)
thompson.thompson.plot(out, width=15, height=10, verbose='info')

Plot the results of the multi-armed bandit algorithm.

Creates visualizations showing the performance of the selected algorithm. The type of plot depends on the method used (Thompson, UCB, or randomized).

Parameters:
  • out (dict) – Output from thompson, UCB, or UCB_random containing the results to plot.

  • width (int, optional) – Width of the figure in inches. Default is 15.

  • height (int, optional) – Height of the figure in inches. Default is 10.

  • verbose (str or int, optional) – Set the verbose messages using string or integer values: * [0, 60, None, ‘silent’, ‘off’, ‘no’]: No message * [10, ‘debug’]: Messages from debug level and higher * [20, ‘info’]: Messages from info level and higher * [30, ‘warning’]: Messages from warning level and higher * [50, ‘critical’, ‘error’]: Messages from critical level and higher Default is ‘info’

Returns:

The function displays the plot directly and returns None.

Return type:

None

Examples

>>> import thompson as th
>>> df = th.import_example()
>>> # Plot Thompson sampling results
>>> out_tps = th.thompson(df)
>>> th.plot(out_tps)
>>> # Plot UCB results
>>> out_ucb = th.UCB(df)
>>> th.plot(out_ucb)
>>> # Plot randomized results
>>> out_ran = th.UCB_random(df)
>>> th.plot(out_ran)
thompson.thompson.set_logger(verbose: [<class 'str'>, <class 'int'>] = 'info')

Set the logger for verbosity messages.

Parameters:

verbose ([str, int], default is 'info' or 20) – Set the verbose messages using string or integer values. * [0, 60, None, ‘silent’, ‘off’, ‘no’]: No message. * [10, ‘debug’]: Messages from debug level and higher. * [20, ‘info’]: Messages from info level and higher. * [30, ‘warning’]: Messages from warning level and higher. * [50, ‘critical’]: Messages from critical level and higher.

Returns:

  • None.

  • > # Set the logger to warning

  • > set_logger(verbose=’warning’)

  • > # Test with different messages

  • > logger.debug(“Hello debug”)

  • > logger.info(“Hello info”)

  • > logger.warning(“Hello warning”)

  • > logger.critical(“Hello critical”)

thompson.thompson.thompson(df, verbose='info')

Perform Thompson sampling on the multi-armed bandit problem.

Thompson sampling is a Bayesian approach to the multi-armed bandit problem. It maintains a probability distribution over the expected rewards of each arm and samples from these distributions to select the next arm to pull.

Parameters:
  • df (pd.DataFrame) – Contains samples[rows] x features[columns]. Each row represents a trial, and each column represents an arm of the bandit. Values should be 0 or 1, where 1 indicates a successful trial.

  • verbose (str or int, optional) – Set the verbose messages using string or integer values: * [0, 60, None, ‘silent’, ‘off’, ‘no’]: No message * [10, ‘debug’]: Messages from debug level and higher * [20, ‘info’]: Messages from info level and higher * [30, ‘warning’]: Messages from warning level and higher * [50, ‘critical’, ‘error’]: Messages from critical level and higher Default is ‘info’

Returns:

Dictionary containing: - columns: Names of the columns (arms) - total_reward: Total rewards obtained - cols_selected: Vector describing which arm was selected for each trial - cols_rewards_1: Number of successful trials per arm - cols_rewards_0: Number of unsuccessful trials per arm - methodtype: ‘thompson’

Return type:

dict

Examples

>>> import thompson as th
>>> df = th.import_example()
>>> out = th.thompson(df)
>>> print(f"Total reward: {out['total_reward']}")
>>> print(f"Best performing arm: {out['columns'][np.argmax(out['cols_rewards_1'])]}")
thompson.thompson.wget(url, writepath)