Distribution estimator

`daspi.statistics.estimation.DistributionEstimator(samples, dist=None, possible_dists=DIST.COMMON, nan_policy='omit')` ¶

Bases: BaseEstimator

A class to estimate the distribution of a given 1D numeric sample.

This class provides methods to estimate distribution by fitting a continuous distribution from scipy.stats to the provided samples. It uses the Kolmogorov-Smirnov test to evaluate the fit of the distribution to the data. The distribution with a higher p-value is considered a better fit.

PARAMETER	DESCRIPTION
`samples`	The 1D numeric sample for which the distribution is to be estimated. This should be a Series or array-like object containing numeric values. TYPE: `NumericSample1D`
`dist`	Distributions to which the data may be subject. Only continuous distributions of scipy.stats are allowed. Default is 'norm' TYPE: `str rv_continuous` DEFAULT: `None`
`possible_dists`	Distributions to which the data may be subject. Only continuous distributions of scipy.stats are allowed, by default `DIST.COMMON` TYPE: `tuple of strings or rv_continous` DEFAULT: `COMMON`
`nan_policy`	How to handle NaN values in the samples. - 'propagate': NaN values are preserved in the analysis. - 'raise': Raises an error if NaN values are found. - 'omit': Omits NaN values from the analysis, default is 'omit'. TYPE: `(propagate, 'raise', omit)` DEFAULT: `'propagate'`

RAISES	DESCRIPTION
`ValueError`	If NaN values are found in the samples and `nan_policy` is set to 'raise'.
`UserWarning`	If NaN values are found in the samples and `nan_policy` is set to 'omit' or 'propagate'. The warning indicates that NaN values will be omitted from the analysis or may lead to unexpected results.

Sources

The theoretical quantiles and percentiles are calculated the same way as statsmodels ProbPlot class does, see: https://www.statsmodels.org/dev/_modules/statsmodels/graphics/gofplots.html

`possible_dists = possible_dists` `instance-attribute` ¶

Distributions given during initialization to which the data may be subject.

`dist_name` `property` ¶

Get the name of the estimated distribution (read-only).

`dist` `property` `writable` ¶

This is the generic continuous distribution class of the provided or evaluated distribution.

Set the distribution to be used for estimation. If a string is provided, it will be converted to a continuous distribution class using ensure_generic. If None, the distribution will be estimated from the samples.

`frozen` `property` ¶

This is the frozen continuous RV object of dist property (read-only).

`D` `property` ¶

Get the Kolmogorov-Smirnov test statistic, either D, D+ or D-.

`shape_params` `property` ¶

Estimates for any distribution shape parameters (if applicable), followed by those for location and scale. For most random variables, shape statistics will be returned, but there are exceptions (e.g. norm). Can be used to generate values with the help of the dist attribute (read-only).

`loc` `property` ¶

Get the loc paramter from shape_params (read-only).

`scale` `property` ¶

Get the scale paramter from shape_params (read-only).

`p_ks` `property` ¶

Get the two-tailed p-value of kolmogorov-smirnof test for the provided or fitted distribution. A higher p-value indicates a better fit to the data (read-only).

`excess` `property` ¶

Get the Fisher kurtosis (excess) of filtered samples. Calculations are corrected for statistical bias (read-only). The curvature of the distribution corresponds to the curvature of a normal distribution when the excess is close to zero. Distributions with negative excess kurtosis are said to be platykurtic, this distribution produces fewer and/or less extreme outliers than the normal distribution (e.g. the uniform distribution has no outliers). Distributions with a positive excess kurtosis are said to be leptokurtic (e.g. the Laplace distribution, which has tails that asymptotically approach zero more slowly than a Gaussian, and therefore produces more outliers than the normal distribution): - excess < 0: less extreme outliers than normal distribution - excess > 0: more extreme outliers than normal distribution

`p_excess` `property` ¶

Get the probability that the excess of the population that the sample was drawn from is the same as that of a corresponding normal distribution (read-only).

`skew` `property` ¶

Get the skewness of the filtered samples (read-only). Calculations are corrected for statistical bias. For normally distributed data, the skewness should be about zero. For unimodal continuous distributions, a skewness value greater than zero means that there is more weight in the right tail of the distribution: - skew < 0: left-skewed -> long tail left - skew > 0: right-skewed -> long tail right

`p_skew` `property` ¶

Get the probability that the skewness of the population that the sample was drawn from is the same as that of a corresponding normal distribution (read-only).

`p_ad` `property` ¶

Get the probability that the filtered samples are subject of the normal distribution by performing a Anderson-Darling test (read-only).

`theoretical_percentiles` `property` ¶

Get the theoretical percentiles (CDF values) of the sample data (read-only)

`theoretical_quantiles` `property` ¶

Get the theoretical quantiles (osm, or order statistic medians) of the filtered samples. This quantiles are calculated the same way as in scipy.stats.probplot() function (read-only).

`sample_quantiles` `property` ¶

Get the sample quantiles (sorted filtered samples) (read-only).

`sample_percentiles` `property` ¶

Get the empirical percentiles (CDF values) of the sample data (read-only)

`predicted` `property` ¶

Get the predicted values of the provided or evaluated distribution by using the order statistic medians (read-only).

`log_likelihood` `property` ¶

Get the log-likelihood of the provided or evaluated distribution (read-only).

`ss` `property` ¶

Get the sum of squared residuals (SS) using the sorted values and the predicted values (read-only).

`aic` `property` ¶

Get the Akaike information criterion (AIC) (read-only).

`bic` `property` ¶

Get the Bayesian information criterion (BIC) (read-only).

`plotting_positions(nobs, alpha=0.0, beta=None)` `staticmethod` ¶

Generates sequence of plotting positions

PARAMETER	DESCRIPTION
`nobs`	Number of probability points to plot TYPE: `int`
`alpha`	alpha parameter for the plotting position of an expected order statistic TYPE: `float` DEFAULT: `0.0`
`beta`	beta parameter for the plotting position of an expected order statistic. If None, then beta is set to alpha. TYPE: `float \| None` DEFAULT: `None`

RETURNS	DESCRIPTION
`Series`	The plotting positions

Notes

The plotting positions are given by

\[ i \in [1, nobs]) \]

\[ \frac{(i - \alpha)}{nobs + 1 - \alpha - \beta} \]

Additional information on alpha and beta see: scipy.stats.mstats.plotting_positions

`distribution()` ¶

Estimate the distribution by selecting the one from the provided distributions that best reflects the filtered data.

RETURNS	DESCRIPTION
`dist`	A generic continous distribution class of best fit TYPE: `scipy.stats rv_continuous`
`p`	The two-tailed p-value for the best fit TYPE: `float`
`shape_params`	Estimates for any shape parameters (if applicable), followed by those for location and scale. For most random variables, shape statistics will be returned, but there are exceptions (e.g. norm). Can be used to generate values with the help of returned dist TYPE: `Tuple[float, ...]`

Notes

First, the p-score is calculated by performing a Kolmogorov-Smirnov test to determine how well each distribution fits the samples. Whatever has the highest P-score is considered the most accurate. This is because a higher p-score means the hypothesis is closest to reality.

`stable_variance(alpha=0.05, n_sections=3)` ¶

Test whether the variance remains stable across the samples.

The sample data is divided into subgroups and the variances of their sections are checked using the Levene test.

PARAMETER	DESCRIPTION
`alpha`	Alpha risk of hypothesis tests. If a p-value is below this limit, the null hypothesis is rejected TYPE: `float` DEFAULT: `0.05`
`n_sections`	Amount of sections to divide the filtered samples into, by default 3 TYPE: `int` DEFAULT: `3`

RETURNS	DESCRIPTION
`stable`	True if the p-value > alpha TYPE: `bool`

`stable_mean(alpha=0.05, n_sections=3)` ¶

Test whether the mean remains stable across the samples.

The sample data is divided into subgroups and the mean of their sections are checked using the F test.

PARAMETER	DESCRIPTION
`alpha`	Alpha risk of hypothesis tests. If a p-value is below this limit, the null hypothesis is rejected TYPE: `float` DEFAULT: `0.05`
`n_sections`	Amount of sections to divide the filtered samples into, by default 3 TYPE: `int` DEFAULT: `3`

RETURNS	DESCRIPTION
`stable`	True if the p-value > alpha TYPE: `bool`

`follows_norm_curve(alpha=0.05, excess_test=True, skew_test=True, ad_test=False)` ¶

Checks whether the sample data is subject to normal distribution by performing one or more of the following tests (depending on the input): - Skewness test - Bulge test - Anderson-Darling test

PARAMETER	DESCRIPTION
`alpha`	Alpha risk of hypothesis tests. If a p-value is below this limit, the null hypothesis is rejected TYPE: `float` DEFAULT: `0.05`
`skew_test`	If true, an skew test will also be carried out, by default True TYPE: `bool` DEFAULT: `True`
`ad_test`	If true, an excess test will also be carried out, by default True TYPE: `bool` DEFAULT: `False`
`ad_test`	If true, an Anderson Darling test will also be carried out, by default False TYPE: `bool` DEFAULT: `False`

RETURNS	DESCRIPTION
`remain_h0`	True if all p-values of the tests performed are greater than alpha, otherwise False TYPE: `bool`

RAISES	DESCRIPTION
`AssertionError`	If all flags are False.

Distribution estimator

daspi.statistics.estimation.DistributionEstimator(samples, dist=None, possible_dists=DIST.COMMON, nan_policy='omit') ¶

possible_dists = possible_dists instance-attribute ¶

dist_name property ¶

dist property writable ¶

frozen property ¶

D property ¶

shape_params property ¶

loc property ¶

scale property ¶

p_ks property ¶

excess property ¶

p_excess property ¶

skew property ¶

p_skew property ¶

p_ad property ¶

theoretical_percentiles property ¶

theoretical_quantiles property ¶

sample_quantiles property ¶

sample_percentiles property ¶

predicted property ¶

log_likelihood property ¶

ss property ¶

aic property ¶

bic property ¶

plotting_positions(nobs, alpha=0.0, beta=None) staticmethod ¶

distribution() ¶

stable_variance(alpha=0.05, n_sections=3) ¶

stable_mean(alpha=0.05, n_sections=3) ¶

follows_norm_curve(alpha=0.05, excess_test=True, skew_test=True, ad_test=False) ¶