API References

benfordslaw is a python library to test the frequency distribution of leading digits.

class benfordslaw.benfordslaw.benfordslaw(alpha=0.05, method='chi2', pos=1, verbose=3)

Class benfordslaw.

fit(X)

Test if an empirical (observed) distribution significantly differs from a theoretical (expected, Benfords) distribution.

The law states that in many naturally occurring collections of numbers, the leading significant digit is likely to be small. This method can be used if you want to test whether your set of numbers may be artificial (or manipulated). Let us assume the null Hypothesis: H0: observed and theoretical distributions are the same. If a certain set of values follows Benford’s Law then model’s for the corresponding predicted values should also follow Benford’s Law. Normal data (Unmanipulated) does trend with Benford’s Law, whereas Manipulated or fraudulent data does not.

Assumptions of the data:

The numbers need to be random and not assigned, with no imposed minimums or maximums.
The numbers should cover several orders of magnitude
Dataset should preferably cover at least 1000 samples. Though Benford’s law has been shown to hold true for datasets containing as few as 50 numbers.

Parameters:: X (list or numpy array) – Input data.

Examples

>>> # Import library
>>> from benfordslaw import benfordslaw
>>> #
>>> # Initialize
>>> bl = benfordslaw(pos=1)
>>> #
>>> # Get data for one candidate
>>> df = bl.import_example()
>>> X = df['votes'].loc[df['candidate']=='Donald Trump'].values
>>> #
>>> # Fit
>>> results = bl.fit(X)
>>> #
>>> # Figure
>>> fig, ax = bl.plot()

Return type:: dict.

import_example(data='elections', url=None, sep=',', verbose=3)

Import example dataset from github source.

Import one of the few datasets from github source or specify your own download url link.

Parameters:

data (str) – Example data sets: ‘elections_rus’ ‘elections_usa’
url (str) – url link to to dataset.
sep (Seperator (String)) – When using URL, seperate the input file based on this seperator.

Returns:

Dataset containing mixed features.

Return type:

pd.DataFrame()

References

https://github.com/erdogant/datazets

plot(title='', fontsize=16, barcolor='black', barwidth=0.3, label='Empirical distribution', figsize=(15, 8), grid=True)

Make bar chart of observed vs expected 1st digit frequency in percent.

Parameters:

fontsize (int, (default : 16)) – Font size.
barwidth (float, (default : 0.3)) – Width of the bars.
barcolor (tuple or string, (default : 'black')) – Color of the bars. Can be of type String such as “red” or “black” but also RGB list such as: [0.5, 0.5, 0.5]
label (String, (default : 'Empirical distribution')) – Label of the figure.
figsize (tuple, optional) – Figure size. The default is (15,8).

Returns:

tuple

Return type:

fig, ax.