Index
daspi.statistics.estimation
¶
Statistical estimation classes and functions.
This module provides higher-level estimators that combine confidence intervals, hypothesis tests, and distribution fitting into coherent analysis objects. It also contains utility functions for non-parametric smoothing and kernel density estimation.
Estimator classes
All estimator classes share a common interface: they accept a sample (and optionally a specification or reference distribution), run a battery of statistical checks internally, and expose their results as plain attributes.
BaseEstimator– abstract base class defining the common interface.LocationDispersionEstimator– estimates mean and standard deviation, computes confidence intervals for both, and performs normality, stability, and shape tests on the sample.DistributionEstimator– fits a parametric SciPy distribution to the data via maximum-likelihood and performs a Kolmogorov-Smirnov goodness-of-fit test.ProcessEstimator– extendsLocationDispersionEstimatorwith process-capability indices (Cp, Cpk, Cpm) and their confidence intervals, given aSpecification.GageEstimator– measurement system analysis; combines multipleProcessEstimatorinstances to quantify measurement uncertainty relative to process variation and tolerance (GUM / Gage R&R style).
Standalone functions
root_sum_squares– root sum of squares of scalar values; used in combined measurement uncertainty calculations.estimate_distribution– fits a parametric distribution to a sample and returns the frozen distribution together with fit diagnostics.estimate_kernel_density– univariate kernel density estimate over a grid.estimate_kernel_density_2d– bivariate kernel density estimate on a 2-D grid.estimate_capability_confidence– delta-method confidence interval for process-capability indices using Monte Carlo bootstrap.estimate_resolution– estimates the effective measurement resolution from a sample.
Smoothing
Loess– locally weighted polynomial regression (LOESS) for univariate data.Lowess– locally weighted scatterplot smoothing (LOWESS) using the statsmodels implementation.
Measurement uncertainty
MeasurementUncertainty– GUM-compliant representation of a single uncertainty contribution; supports rectangular, triangular, and normal distributions and can be combined with other instances via root-sum-of-squares.
Notes
The ProcessEstimator and GageEstimator classes depend on
Specification / SpecLimits from the montecarlo module, and on
the hypothesis-testing functions from the hypothesis module and the
confidence interval functions from the confidence module. They are
therefore imported from those modules rather than being reimplemented
here.
MeasurementUncertainty(*, standard=None, expanded=None, error_limit=None, distribution_factor=None, k=2, confidence_level=None, distribution='rectangular')
¶
A class to represent and calculate measurement uncertainty.
This class provides multiple ways to define measurement uncertainty: 1. From error limit and distribution factor 2. From expanded uncertainty and coverage factor k 3. From standard uncertainty directly
| PARAMETER | DESCRIPTION |
|---|---|
standard
|
The standard uncertainty (u). If provided the parameters expanded and error_limit are ignored. To initialize a non-significant measurement uncertainty, set standard to 0.
TYPE:
|
error_limit
|
The maximum allowable deviation from the true value, also known as the tolerance range. This parameter represents the worst-case scenario for measurement error, indicating how much the measured value can differ from the actual value. It is used to calculate the standard uncertainty based on the specified distribution factor. The value must be positive, as a negative error limit does not have a physical meaning in the context of measurement uncertainty.
TYPE:
|
distribution_factor
|
The distribution factor based on the assumed distribution.
Common values:
- √3 ≈ 1.732 for rectangular (uniform) distribution
- 2 for triangular distribution
TYPE:
|
expanded
|
The expanded uncertainty (U).
TYPE:
|
k
|
The coverage factor for expanded uncertainty. It is used as a
multiplier to determine the expanded uncertainty based on the
standard uncertainty. The value of
TYPE:
|
confidence_level
|
The confidence level (0 to 1) to calculate coverage factor for normal distribution. Default is 0.95 (95% confidence).
TYPE:
|
distribution
|
The assumed probability distribution for calculating distribution factor. Only used if distribution_factor is not explicitly provided. Default is 'rectangular'.
TYPE:
|
Notes
To initialize a non-significant measurement uncertainty, set standard to 0. This uncertainty can then be used for further calculations and combined with others, but it does not affect the "addition" of uncertainties.
Examples:
Create uncertainty from error limit (rectangular distribution):
# Error limit ±0.1, rectangular distribution
u_1 = dsp.MeasurementUncertainty(error_limit=0.1)
print(f"Standard uncertainty: {u_1.standard:.4f}")
Create uncertainty from expanded uncertainty:
# Expanded uncertainty U = 0.2 with k = 2
u_2 = dsp.MeasurementUncertainty(
expanded=0.2, k=2)
print(f"Standard uncertainty: {u_2.standard:.4f}")
Create uncertainty directly:
# Direct standard uncertainty
u_3 = dsp.MeasurementUncertainty(standard=0.05)
print(f"Expanded uncertainty (k=2): {u_3.expanded(2):.4f}")
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If insufficient or conflicting parameters are provided. |
AssertionError
|
If parameter values are invalid (negative, out of range, etc.). |
standard
property
¶
Get the standard uncertainty (u) (read-only).
confidence_level
property
¶
Get the confidence level used for calculations (read-only).
k
property
¶
Get the coverage factor k used in uncertainty
calculations (read-only).
This property returns the coverage factor, which is a multiplier
used to determine the expanded uncertainty based on the standard
uncertainty. The value of k is typically set to reflect the
desired confidence level in the measurement results.
expanded
property
¶
Get expanded uncertainty. If it was not provided during initialization, it will be calculated from the standard uncertainty and coverage factor k (U = k × u) (read-only).
error_limit
property
¶
Get the error limit associated with the measurement uncertainty.
This property returns the maximum allowable deviation from the true value, which is also known as the tolerance range. If the error limit was not provided during initialization, it will be calculated from the standard uncertainty and the distribution factor. The calculation is based on the assumption that the error follows the specified probability distribution. (error_limit = u × distribution_factor) (read-only).
distribution
property
¶
Get the assumed probability distribution (read-only).
distribution_factor
property
¶
Get the distribution factor (read-only).
quality_indicator(tolerance)
¶
Calculate the quality indicator Q.
Q serves as a quality indicator for the measurement process, reflecting how well the measurement system performs in relation to the specified requirements and tolerances.
relative(measured_value)
¶
Calculate the relative standard uncertainty as a percentage.
| PARAMETER | DESCRIPTION |
|---|---|
measured_value
|
The measured value to calculate relative uncertainty for.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
float
|
The relative uncertainty as a percentage. |
| RAISES | DESCRIPTION |
|---|---|
AssertionError
|
If measured_value is zero. |
combine_with(*others, method='rss')
¶
Combine this uncertainty with other uncertainties.
| PARAMETER | DESCRIPTION |
|---|---|
*others
|
Other uncertainty instances to combine with.
TYPE:
|
method
|
Combination method: - 'rss': Root sum of squares (for independent uncertainties) - 'linear': Linear addition (for fully correlated uncertainties) Default is 'rss'.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
MeasurementUncertainty
|
A new instance with the combined uncertainty. |
Examples:
u_1 = dsp.MeasurementUncertainty(standard=0.1)
u_2 = dsp.MeasurementUncertainty(error_limit=0.05)
u_3 = dsp.MeasurementUncertainty(expanded=0.2, k=2)
# Combine using root sum of squares (default)
combined_rss = u_1.combine_with(u_2, u_3)
# Combine using linear addition
combined_linear = u_1.combine_with(u_2, u_3, method='linear')
summary()
¶
Get a summary of uncertainty values.
| RETURNS | DESCRIPTION |
|---|---|
Dict[str, float | str]
|
Dictionary containing various uncertainty representations. |
root_sum_squares(*args)
¶
Calculate the root sum of squares of the given arguments.
| PARAMETER | DESCRIPTION |
|---|---|
*args
|
Values to be summed up
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
float
|
The root sum of squares of the given arguments. |
Notes
The root sum of squares is calculated as follows:
$$ \sqrt{x_1^2 + x_2^2 + ... + x_n^2}
$$
If only one argument is provided, it returns the argument itself.
| RAISES | DESCRIPTION |
|---|---|
AssertionError
|
If no arguments are provided or if any argument is not of type int or float. |
estimate_distribution(data, dists=DIST.COMMON)
¶
First, the p-score is calculated by performing a Kolmogorov-Smirnov test to determine how well each distribution fits the data. Whatever has the highest P-score is considered the most accurate. This is because a higher p-score means the hypothesis is closest to reality.
| PARAMETER | DESCRIPTION |
|---|---|
data
|
1d array of data for which a distribution is to be searched
TYPE:
|
dists
|
Distributions to which the data may be subject. Only continuous distributions of scipy.stats are allowed, by default DIST.COMMON
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dist
|
A generic continous distribution class of best fit
TYPE:
|
p
|
The two-tailed p-value for the best fit
TYPE:
|
shape_params
|
Estimates for any shape parameters (if applicable), followed by those for location and scale. For most random variables, shape statistics will be returned, but there are exceptions (e.g. norm). Can be used to generate values with the help of returned dist
TYPE:
|
estimate_kernel_density(data, *, stretch=1, height=None, base=0, n_points=DEFAULT.KD_SEQUENCE_LEN, margin=0.5)
¶
Estimates the kernel density of data and returns values that are useful for a plot. If those values are plotted in combination with a histogram, set height as max value of the hostogram.
Kernel density estimation is a way to estimate the probability density function (PDF) of a random variable in a non-parametric way. The used gaussian_kde function of scipy.stats works for both uni-variate and multi-variate data. It includes automatic bandwidth determination. The estimation works best for a unimodal distribution; bimodal or multi-modal distributions tend to be oversmoothed.
| PARAMETER | DESCRIPTION |
|---|---|
data
|
1-D array of datapoints to estimate from.
TYPE:
|
stretch
|
Stretch the distribution estimate by the given factor, is only considered if "height" is None, by default 1
TYPE:
|
height
|
If the KDE curve is plotted in combination with other data (e.g. a histogram), you can use height to specify the height at the maximum point of the KDE curve. If this value is specified, the area under the curve will not be normalized, by default None
TYPE:
|
base
|
The curve is shifted in the estimated direction by the given amount. This is usefull for violine plots, by default 0
TYPE:
|
n_points
|
Number of points the estimation and sequence should have, by default KD_SEQUENCE_LEN (defined in constants.py)
TYPE:
|
margin
|
Margin for the sequence as factor of data range (max - min ). If margin is 0, The two ends of the estimated density curve then show the minimum and maximum value. Default is 0.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
sequence
|
Data points at regular intervals from input data minimum to maximum
TYPE:
|
estimation
|
Data points of kernel density estimation
TYPE:
|
estimate_kernel_density_2d(feature, target, *, n_points=DEFAULT.KD_SEQUENCE_LEN, margin=0.5)
¶
Estimates the kernel density of 2 dimensional data and returns values that are useful for a contour plot.
Kernel density estimation is a way to estimate the probability density function (PDF) of a random variable in a non-parametric way. The used gaussian_kde function of scipy.stats works for both uni-variate and multi-variate data. It includes automatic bandwidth determination. The estimation works best for a unimodal distribution; bimodal or multi-modal distributions tend to be oversmoothed.
| PARAMETER | DESCRIPTION |
|---|---|
feature
|
A one-dimensional array-like object containing the exogenous samples.
TYPE:
|
target
|
A one-dimensional array-like object containing the endogenous samples.
TYPE:
|
n_points
|
Number of points the estimation and sequence should have, by default KD_SEQUENCE_LEN (defined in constants.py)
TYPE:
|
margin
|
Margin for the sequence as factor of data range, by default 0.5.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
feature_seq
|
Data points at regular intervals from input data minimum to maximum used for feature data
TYPE:
|
target_seq
|
Data points at regular intervals from input data minimum to maximum used for target data
TYPE:
|
estimation
|
Data points of kernel density estimation
TYPE:
|
| RAISES | DESCRIPTION |
|---|---|
AssertionError:
|
If the provided data is empty, contains only zeros or all values are identical. |
estimate_capability_confidence(process, *, kind='cpk', level=0.95, n_groups=1)
¶
Calculates the confidence interval for the process capability index (Cp or Cpk) of a process.
This function is an extension of the cp_ci and cpk_ci functions.
It instantiates a ProcessEstimator and then determines the
confidence intervals using the Cp or Cpk values from the estimator.
| PARAMETER | DESCRIPTION |
|---|---|
process
|
Process Estimator instance, is required to get the necessary process information such as capability indices and number of samples.
TYPE:
|
kind
|
Specifies whether to calculate the confidence interval for Cp or Cpk ('cp' or 'cpk'). Defaults is 'cpk'.
TYPE:
|
level
|
The desired confidence level for the interval, expressed as a decimal. Default is 0.95 (95% confidence).
TYPE:
|
n_groups
|
The number of groups for Bonferroni correction to adjust for multiple comparisons. Default is 1, indicating no correction
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Tuple[float, float, float]:
|
A tuple containing the estimate, lower bound, and upper bound of the confidence interval for the specified process capability index. |
| RAISES | DESCRIPTION |
|---|---|
AssertionError:
|
If provided kind is not 'cp' or 'cpk'. |
ValueError:
|
If no limit is provided or if only one limit is provided and kind is set to 'cp'. |
estimate_resolution(data)
¶
Estimate the resolution based on the length of the samples digits.
| PARAMETER | DESCRIPTION |
|---|---|
data
|
1-D array of datapoints to estimate from.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
float
|
The estimated resolution. |