Skip to content

Location dispersion estimator

daspi.statistics.estimation.LocationDispersionEstimator(samples, strategy='norm', agreement=6, possible_dists=DIST.COMMON, evaluate=None, nan_policy='omit')

Bases: DistributionEstimator

An object for various statistical estimators

The attributes are calculated lazily. After the class is instantiated, all attributes are set to None. As soon as an attribute (actually Property) is called, the value is calculated and stored so that the calculation is only performed once

PARAMETER DESCRIPTION
samples

sample data

TYPE: NumericSample1D

strategy

Which strategy should be used to determine the control limits (process spread): - eval: The strategy is determined according to the given evaluate function. If none is given, the internal evaluate method is used. - fit: First, the distribution that best represents the process data is searched for and then the agreed process spread is calculated - norm: it is assumed that the data is subject to normal distribution. The variation tolerance is then calculated as agreement * standard deviation - data: The quantiles for the process variation tolerance are read directly from the data.

Default is 'norm'.

TYPE: (eval, fit, norm, data) DEFAULT: 'eval'

agreement

Specify the tolerated process variation for which the control limits are to be calculated. - If int, the spread is determined using the normal distribution agreementσ, e.g. agreement = 6 -> 6σ ~ covers 99.75 % of the data. The upper and lower permissible quantiles are then calculated from this. - If float, the value must be between 0 and 1.This value is then interpreted as the acceptable proportion for the spread, e.g. 0.9973 (which corresponds to ~ 6 σ)

Default is 6 because SixSigma ;-)

TYPE: int or float DEFAULT: 6

possible_dists

Distributions to which the data may be subject. Only continuous distributions of scipy.stats are allowed, by default DIST.COMMON

TYPE: tuple of strings or rv_continous DEFAULT: COMMON

evaluate

Provide a function that evaluates the strategy. If strategy is set to 'eval', this function is used to determine the strategy. The function should take the samples as input and return a string that corresponds to a valid strategy 'fit', 'norm', or 'data'. If not provided, the internal evaluate method is used. For more information on the evaluate method, see the class documentation. Default is None.

TYPE: Callable | None DEFAULT: None

nan_policy

How to handle NaN values in the samples. - 'propagate': NaN values are preserved in the analysis. - 'raise': Raises an error if NaN values are found. - 'omit': Omits NaN values from the analysis, default is 'omit'.

TYPE: (propagate, 'raise', omit) DEFAULT: 'propagate'

Examples:

import numpy as np
import daspi as dsp

np.random.seed(1)
samples = data = np.random.weibull(a=1.5, size=100)
estimation = dsp.LocationDispersionEstimator(
    samples=samples,
    strategy='fit',
    agreement=6,
    possible_dists=dsp.DIST.COMMON_NOT_NORM)
print(estimation.describe())

So you will receive the following output:

                  None
n_samples          100
n_missing            0
min           0.002356
max           2.724595
R             2.722239
mean           0.86943
median         0.74006
std           0.593666
sem           0.059367
dist       weibull_min
p_ks          0.968613
p_ad          0.000455
excess        0.163836
p_excess      0.599041
skew          0.802202
p_skew        0.001918
strategy           fit
lcl          -0.004917
ucl            3.43952
Notes

A special case occurs when the agreement is 1. For a corresponding standard deviation, enter 1 as an integer. If you want percentiles or the entire range, enter it as a floating-point number (1.0) or as float('inf'). If strategy is 'data', lcl and ucl correspond to min and max, otherwise we get -inf and inf.

RAISES DESCRIPTION
ValueError

If NaN values are found in the samples and nan_policy is set to 'raise'.

UserWarning

If NaN values are found in the samples and nan_policy is set to 'omit' or 'propagate'. The warning indicates that NaN values will be omitted from the analysis or may lead to unexpected results.

dof property

Get degree of freedom for filtered samples (read-only).

min property

Get the minimum value of filtered samples (read-only).

max property

Get the maximum value of filtered samples (read-only).

R property

Get range of filtered samples (read-only).

mean property

Get mean of filtered samples (read-only).

median property

Get median of filtered samples (read-only).

std property

Get standard deviation of filtered samples (read-only).

sem property

Get standard error mean of filtered samples (read-only).

lcl property

Get lower control limit according to given strategy and agreement (read-only).

ucl property

Get upper control limit according to given strategy and agreement (read-only).

q_low property

Get quantil for lower control limit according to given agreement. If the samples is subject to normal distribution and the agreement is given as 6, this value corresponds to the 0.135 % quantile: 6 σ ~ 99.73 % of the samples (read-only).

q_upp property

Get quantil for upper control limit according to given agreement. If the sample data is subject to normal distribution and the agreement is given as 6, this value corresponds to the Q_0.99865: 0.99865-quantile or 99.865-percentile (read-only).

strategy property writable

Strategy used to determine the control limits. The control limits can also be interpreted as the process range.

Set strategy as one of {'eval', 'fit', 'norm', 'data'} - eval: If no evaluate function is given, the strategy is determined according to the internal evaluate method. - fit: First, the distribution is searched for that best represents the process data and then the process variation tolerance is calculated - norm: it is assumed that the data is subject to normal distribution. The variation tolerance is then calculated as agreement * standard deviation - data: The quantiles for the process variation tolerance are read directly from the samples.

agreement property writable

Get the agreement multiplier for the σ (standard deviation) used in calculating Cp and Cpk values.

The agreement is defined as twice the coverage factor k. Setting this value will reset the Cp and Cpk values to None, reflecting that the underlying uncertainty parameters have changed.

When setting the agreement using a percentile, provide the acceptable proportion for the spread, such as 0.9973, which corresponds to approximately 6σ (six standard deviations). The agreement value must be specified as either: - A percentage (0.0 < agreement <= 1.0) indicating the acceptable proportion for the spread. - A multiple of the standard deviation (agreement >= 1).

Special Case: - If the agreement is set to 1 (indicating a standard deviation multiplier), enter it as an integer (1). - For percentiles or a broader range, use a floating-point representation (e.g., 1.0) or float('inf') for an infinite range.

RAISES DESCRIPTION
AssertionError

If the provided agreement value is not in the valid range (0.0 < agreement <= 1.0 for percentiles or agreement >= 1 for standard deviation multipliers).

k property

Get the coverage factor k used in uncertainty calculations (read-only).

This property returns the coverage factor, which is a multiplier used to determine the expanded uncertainty based on the standard uncertainty. The value of k is typically set to reflect the desired confidence level in the measurement results.

z_transform(x)

Transform value to z-score.

This method produces a value from a distribution with a mean of 0 and a standard deviation of 1. The value indicates how many standard deviations the value is from the mean.

PARAMETER DESCRIPTION
x

value to be transformed

TYPE: float

RETURNS DESCRIPTION
z

z-score

TYPE: float

mean_ci(level=0.95)

Two sided confidence interval for mean of filtered data

PARAMETER DESCRIPTION
level

confidence level, by default 0.95

TYPE: float in (0, 1) DEFAULT: 0.95

RETURNS DESCRIPTION
ci_low, ci_upp : float

lower and upper confidence level

median_ci(level=0.95)

Two sided confidence interval for median of filtered data

PARAMETER DESCRIPTION
level

confidence level, by default 0.95

TYPE: float in (0, 1) DEFAULT: 0.95

RETURNS DESCRIPTION
ci_low, ci_upp : float

lower and upper confidence level

stdev_ci(level=0.95)

Two sided confidence interval for standard deviation of filtered data

PARAMETER DESCRIPTION
level

confidence level, by default 0.95

TYPE: float in (0, 1) DEFAULT: 0.95

RETURNS DESCRIPTION
ci_low, ci_upp : float

lower and upper confidence level

evaluate()

Evaluate strategy to calculate control limits. If no evaluate function is given the strategy is evaluated as follows:

  1. If variance is not stable within the samples -> strategy = 'data'
  2. If variance and mean is stable and samples follow a normal curve -> strategy = 'norm'
  3. If variance and mean is stable but samples don't follow a normal curve -> strategy = 'fit'
  4. If variance is stable but mean not and samples follow a normal curve -> strategy = 'norm'
  5. If variance is stable but mean not and samples don't follow a normal curve -> strategy = 'data'
RETURNS DESCRIPTION
strategy

Evaluated strategy to calculate control limits

TYPE: {fit, norm, data}