Distribution estimator
daspi.statistics.estimation.DistributionEstimator(samples, dist=None, possible_dists=DIST.COMMON, nan_policy='omit')
¶
Bases: BaseEstimator
A class to estimate the distribution of a given 1D numeric sample.
This class provides methods to estimate distribution by fitting a
continuous distribution from scipy.stats to the provided samples.
It uses the Kolmogorov-Smirnov test to evaluate the fit of the
distribution to the data. The distribution with a higher p-value
is considered a better fit.
| PARAMETER | DESCRIPTION |
|---|---|
samples
|
The 1D numeric sample for which the distribution is to be estimated. This should be a Series or array-like object containing numeric values.
TYPE:
|
dist
|
Distributions to which the data may be subject. Only continuous distributions of scipy.stats are allowed. Default is 'norm'
TYPE:
|
possible_dists
|
Distributions to which the data may be subject. Only
continuous distributions of scipy.stats are allowed,
by default
TYPE:
|
nan_policy
|
How to handle NaN values in the samples. - 'propagate': NaN values are preserved in the analysis. - 'raise': Raises an error if NaN values are found. - 'omit': Omits NaN values from the analysis, default is 'omit'.
TYPE:
|
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If NaN values are found in the samples and |
UserWarning
|
If NaN values are found in the samples and |
Sources
The theoretical quantiles and percentiles are calculated the same way as statsmodels ProbPlot class does, see: https://www.statsmodels.org/dev/_modules/statsmodels/graphics/gofplots.html
possible_dists = possible_dists
instance-attribute
¶
Distributions given during initialization to which the data may be subject.
dist_name
property
¶
Get the name of the estimated distribution (read-only).
dist
property
writable
¶
This is the generic continuous distribution class of the provided or evaluated distribution.
Set the distribution to be used for estimation. If a string
is provided, it will be converted to a continuous distribution
class using ensure_generic. If None, the distribution will be
estimated from the samples.
frozen
property
¶
This is the frozen continuous RV object of dist property (read-only).
D
property
¶
Get the Kolmogorov-Smirnov test statistic, either D, D+ or D-.
shape_params
property
¶
Estimates for any distribution shape parameters (if applicable), followed by those for location and scale. For most random variables, shape statistics will be returned, but there are exceptions (e.g. norm). Can be used to generate values with the help of the dist attribute (read-only).
loc
property
¶
Get the loc paramter from shape_params (read-only).
scale
property
¶
Get the scale paramter from shape_params (read-only).
p_ks
property
¶
Get the two-tailed p-value of kolmogorov-smirnof test for the provided or fitted distribution. A higher p-value indicates a better fit to the data (read-only).
excess
property
¶
Get the Fisher kurtosis (excess) of filtered samples. Calculations are corrected for statistical bias (read-only). The curvature of the distribution corresponds to the curvature of a normal distribution when the excess is close to zero. Distributions with negative excess kurtosis are said to be platykurtic, this distribution produces fewer and/or less extreme outliers than the normal distribution (e.g. the uniform distribution has no outliers). Distributions with a positive excess kurtosis are said to be leptokurtic (e.g. the Laplace distribution, which has tails that asymptotically approach zero more slowly than a Gaussian, and therefore produces more outliers than the normal distribution): - excess < 0: less extreme outliers than normal distribution - excess > 0: more extreme outliers than normal distribution
p_excess
property
¶
Get the probability that the excess of the population that the sample was drawn from is the same as that of a corresponding normal distribution (read-only).
skew
property
¶
Get the skewness of the filtered samples (read-only). Calculations are corrected for statistical bias. For normally distributed data, the skewness should be about zero. For unimodal continuous distributions, a skewness value greater than zero means that there is more weight in the right tail of the distribution: - skew < 0: left-skewed -> long tail left - skew > 0: right-skewed -> long tail right
p_skew
property
¶
Get the probability that the skewness of the population that the sample was drawn from is the same as that of a corresponding normal distribution (read-only).
p_ad
property
¶
Get the probability that the filtered samples are subject of the normal distribution by performing a Anderson-Darling test (read-only).
theoretical_percentiles
property
¶
Get the theoretical percentiles (CDF values) of the sample data (read-only)
theoretical_quantiles
property
¶
Get the theoretical quantiles (osm, or order statistic
medians) of the filtered samples. This quantiles are calculated
the same way as in scipy.stats.probplot() function
(read-only).
sample_quantiles
property
¶
Get the sample quantiles (sorted filtered samples) (read-only).
sample_percentiles
property
¶
Get the empirical percentiles (CDF values) of the sample data (read-only)
predicted
property
¶
Get the predicted values of the provided or evaluated distribution by using the order statistic medians (read-only).
log_likelihood
property
¶
Get the log-likelihood of the provided or evaluated distribution (read-only).
ss
property
¶
Get the sum of squared residuals (SS) using the sorted values and the predicted values (read-only).
aic
property
¶
Get the Akaike information criterion (AIC) (read-only).
bic
property
¶
Get the Bayesian information criterion (BIC) (read-only).
plotting_positions(nobs, alpha=0.0, beta=None)
staticmethod
¶
Generates sequence of plotting positions
| PARAMETER | DESCRIPTION |
|---|---|
nobs
|
Number of probability points to plot
TYPE:
|
alpha
|
alpha parameter for the plotting position of an expected order statistic
TYPE:
|
beta
|
beta parameter for the plotting position of an expected order statistic. If None, then beta is set to alpha.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Series
|
The plotting positions |
Notes
The plotting positions are given by
Additional information on alpha and beta see:
scipy.stats.mstats.plotting_positions
distribution()
¶
Estimate the distribution by selecting the one from the provided distributions that best reflects the filtered data.
| RETURNS | DESCRIPTION |
|---|---|
dist
|
A generic continous distribution class of best fit
TYPE:
|
p
|
The two-tailed p-value for the best fit
TYPE:
|
shape_params
|
Estimates for any shape parameters (if applicable), followed by those for location and scale. For most random variables, shape statistics will be returned, but there are exceptions (e.g. norm). Can be used to generate values with the help of returned dist
TYPE:
|
Notes
First, the p-score is calculated by performing a Kolmogorov-Smirnov test to determine how well each distribution fits the samples. Whatever has the highest P-score is considered the most accurate. This is because a higher p-score means the hypothesis is closest to reality.
stable_variance(alpha=0.05, n_sections=3)
¶
Test whether the variance remains stable across the samples.
The sample data is divided into subgroups and the variances of their sections are checked using the Levene test.
| PARAMETER | DESCRIPTION |
|---|---|
alpha
|
Alpha risk of hypothesis tests. If a p-value is below this limit, the null hypothesis is rejected
TYPE:
|
n_sections
|
Amount of sections to divide the filtered samples into, by default 3
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
stable
|
True if the p-value > alpha
TYPE:
|
stable_mean(alpha=0.05, n_sections=3)
¶
Test whether the mean remains stable across the samples.
The sample data is divided into subgroups and the mean of their sections are checked using the F test.
| PARAMETER | DESCRIPTION |
|---|---|
alpha
|
Alpha risk of hypothesis tests. If a p-value is below this limit, the null hypothesis is rejected
TYPE:
|
n_sections
|
Amount of sections to divide the filtered samples into, by default 3
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
stable
|
True if the p-value > alpha
TYPE:
|
follows_norm_curve(alpha=0.05, excess_test=True, skew_test=True, ad_test=False)
¶
Checks whether the sample data is subject to normal distribution by performing one or more of the following tests (depending on the input): - Skewness test - Bulge test - Anderson-Darling test
| PARAMETER | DESCRIPTION |
|---|---|
alpha
|
Alpha risk of hypothesis tests. If a p-value is below this limit, the null hypothesis is rejected
TYPE:
|
skew_test
|
If true, an skew test will also be carried out, by default True
TYPE:
|
ad_test
|
If true, an excess test will also be carried out, by default True
TYPE:
|
ad_test
|
If true, an Anderson Darling test will also be carried out, by default False
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
remain_h0
|
True if all p-values of the tests performed are greater than alpha, otherwise False
TYPE:
|
| RAISES | DESCRIPTION |
|---|---|
AssertionError
|
If all flags are False. |