Abstract
''''''''

Background
	Probability distribution fitting is the fitting of a probability distribution to a series of repeated measurements of a variable phenomenon. A distribution with a close fit can be used for various purposes as described in use-cases.

	An approach to fit a probability distribution to data is a goodness of fit test. This compares the observed frequency (f) to the expected frequency from the model (f-hat) for any number of classes. In ``distfit`` we computed the goodness of fit test with the Sum of Squared Errors (or estimates) (SSE), also named Residual Sum of Squares (RSS).

	The *RSS* describes the deviation predicted from actual empirical values of data. Or in other words, the differences in the estimates. It is a measure of the discrepancy between the data and an estimation model. A small RSS indicates a close fit of the model to the data. RSS is computed by:

	.. figure:: ../figs/RSS.svg

	Where **yi** is the i-th value of the variable to be predicted, **xi** is the i-th value of the explanatory variable, and **f(xi)** is the predicted value of **yi** (also termed **y-hat**).


	``distfit`` is a python package for probability density fitting across 89 univariate distributions to non-censored data by RSS. The best fitted distribution is returned with the loc, scale, arg parameters which can then be used to compute the probability on new data-points.

Use-cases
	The ``distfit`` function has many use-cases. First of all to determine the best theoretical distribution for your data. This can reduce tens-of-thousands of data points into 3 floating parameters. Another application is for *outlier detection*. A null-distribution can be determined using the **normal** state. New datapoints that deviate significantly can then be marked as *outliers*, and are potentially of interest. The null-distribution can also be generated by randomization/permuation approaches. In such case, the new datapoints will be marked if it significantly deviates from randomness.
    

.. include:: add_bottom.add