Skip to content

Index

daspi.statistics.estimation

Statistical estimation classes and functions.

This module provides higher-level estimators that combine confidence intervals, hypothesis tests, and distribution fitting into coherent analysis objects. It also contains utility functions for non-parametric smoothing and kernel density estimation.

Estimator classes

All estimator classes share a common interface: they accept a sample (and optionally a specification or reference distribution), run a battery of statistical checks internally, and expose their results as plain attributes.

  • BaseEstimator – abstract base class defining the common interface.
  • LocationDispersionEstimator – estimates mean and standard deviation, computes confidence intervals for both, and performs normality, stability, and shape tests on the sample.
  • DistributionEstimator – fits a parametric SciPy distribution to the data via maximum-likelihood and performs a Kolmogorov-Smirnov goodness-of-fit test.
  • ProcessEstimator – extends LocationDispersionEstimator with process-capability indices (Cp, Cpk, Cpm) and their confidence intervals, given a Specification.
  • GageEstimator – measurement system analysis; combines multiple ProcessEstimator instances to quantify measurement uncertainty relative to process variation and tolerance (GUM / Gage R&R style).
Standalone functions
  • root_sum_squares – root sum of squares of scalar values; used in combined measurement uncertainty calculations.
  • estimate_distribution – fits a parametric distribution to a sample and returns the frozen distribution together with fit diagnostics.
  • estimate_kernel_density – univariate kernel density estimate over a grid.
  • estimate_kernel_density_2d – bivariate kernel density estimate on a 2-D grid.
  • estimate_capability_confidence – delta-method confidence interval for process-capability indices using Monte Carlo bootstrap.
  • estimate_resolution – estimates the effective measurement resolution from a sample.
Smoothing
  • Loess – locally weighted polynomial regression (LOESS) for univariate data.
  • Lowess – locally weighted scatterplot smoothing (LOWESS) using the statsmodels implementation.
Measurement uncertainty
  • MeasurementUncertainty – GUM-compliant representation of a single uncertainty contribution; supports rectangular, triangular, and normal distributions and can be combined with other instances via root-sum-of-squares.
Notes

The ProcessEstimator and GageEstimator classes depend on Specification / SpecLimits from the montecarlo module, and on the hypothesis-testing functions from the hypothesis module and the confidence interval functions from the confidence module. They are therefore imported from those modules rather than being reimplemented here.

MeasurementUncertainty(*, standard=None, expanded=None, error_limit=None, distribution_factor=None, k=2, confidence_level=None, distribution='rectangular')

A class to represent and calculate measurement uncertainty.

This class provides multiple ways to define measurement uncertainty: 1. From error limit and distribution factor 2. From expanded uncertainty and coverage factor k 3. From standard uncertainty directly

PARAMETER DESCRIPTION
standard

The standard uncertainty (u). If provided the parameters expanded and error_limit are ignored. To initialize a non-significant measurement uncertainty, set standard to 0.

TYPE: float DEFAULT: None

error_limit

The maximum allowable deviation from the true value, also known as the tolerance range. This parameter represents the worst-case scenario for measurement error, indicating how much the measured value can differ from the actual value. It is used to calculate the standard uncertainty based on the specified distribution factor. The value must be positive, as a negative error limit does not have a physical meaning in the context of measurement uncertainty.

TYPE: float DEFAULT: None

distribution_factor

The distribution factor based on the assumed distribution. Common values: - √3 ≈ 1.732 for rectangular (uniform) distribution - 2 for triangular distribution
- 1 for normal distribution (if error_limit is already 1σ)

TYPE: float DEFAULT: None

expanded

The expanded uncertainty (U).

TYPE: float DEFAULT: None

k

The coverage factor for expanded uncertainty. It is used as a multiplier to determine the expanded uncertainty based on the standard uncertainty. The value of k is typically set to reflect the desired confidence level in the measurement results. Default is 2, typical values are: - k=2 corresponds to a confidence interval of 95.45% - k=3 corresponds to a confidence interval of 99.73%

TYPE: int | float DEFAULT: 2

confidence_level

The confidence level (0 to 1) to calculate coverage factor for normal distribution. Default is 0.95 (95% confidence).

TYPE: float DEFAULT: None

distribution

The assumed probability distribution for calculating distribution factor. Only used if distribution_factor is not explicitly provided. Default is 'rectangular'.

TYPE: (rectangular, triangular, normal) DEFAULT: 'rectangular'

Notes

To initialize a non-significant measurement uncertainty, set standard to 0. This uncertainty can then be used for further calculations and combined with others, but it does not affect the "addition" of uncertainties.

Examples:

Create uncertainty from error limit (rectangular distribution):

# Error limit ±0.1, rectangular distribution
u_1 = dsp.MeasurementUncertainty(error_limit=0.1)
print(f"Standard uncertainty: {u_1.standard:.4f}")

Create uncertainty from expanded uncertainty:

# Expanded uncertainty U = 0.2 with k = 2
u_2 = dsp.MeasurementUncertainty(
    expanded=0.2, k=2)
print(f"Standard uncertainty: {u_2.standard:.4f}")

Create uncertainty directly:

# Direct standard uncertainty
u_3 = dsp.MeasurementUncertainty(standard=0.05)
print(f"Expanded uncertainty (k=2): {u_3.expanded(2):.4f}")
RAISES DESCRIPTION
ValueError

If insufficient or conflicting parameters are provided.

AssertionError

If parameter values are invalid (negative, out of range, etc.).

standard property

Get the standard uncertainty (u) (read-only).

confidence_level property

Get the confidence level used for calculations (read-only).

k property

Get the coverage factor k used in uncertainty calculations (read-only).

This property returns the coverage factor, which is a multiplier used to determine the expanded uncertainty based on the standard uncertainty. The value of k is typically set to reflect the desired confidence level in the measurement results.

expanded property

Get expanded uncertainty. If it was not provided during initialization, it will be calculated from the standard uncertainty and coverage factor k (U = k × u) (read-only).

error_limit property

Get the error limit associated with the measurement uncertainty.

This property returns the maximum allowable deviation from the true value, which is also known as the tolerance range. If the error limit was not provided during initialization, it will be calculated from the standard uncertainty and the distribution factor. The calculation is based on the assumption that the error follows the specified probability distribution. (error_limit = u × distribution_factor) (read-only).

distribution property

Get the assumed probability distribution (read-only).

distribution_factor property

Get the distribution factor (read-only).

quality_indicator(tolerance)

Calculate the quality indicator Q.

Q serves as a quality indicator for the measurement process, reflecting how well the measurement system performs in relation to the specified requirements and tolerances.

\[ U = k * u \]
\[ Q_{MP} = \frac{2*U}{T} \]

relative(measured_value)

Calculate the relative standard uncertainty as a percentage.

PARAMETER DESCRIPTION
measured_value

The measured value to calculate relative uncertainty for.

TYPE: float

RETURNS DESCRIPTION
float

The relative uncertainty as a percentage.

RAISES DESCRIPTION
AssertionError

If measured_value is zero.

combine_with(*others, method='rss')

Combine this uncertainty with other uncertainties.

PARAMETER DESCRIPTION
*others

Other uncertainty instances to combine with.

TYPE: MeasurementUncertainty | float DEFAULT: ()

method

Combination method: - 'rss': Root sum of squares (for independent uncertainties) - 'linear': Linear addition (for fully correlated uncertainties) Default is 'rss'.

TYPE: (rss, linear) DEFAULT: 'rss'

RETURNS DESCRIPTION
MeasurementUncertainty

A new instance with the combined uncertainty.

Examples:

u_1 = dsp.MeasurementUncertainty(standard=0.1)
u_2 = dsp.MeasurementUncertainty(error_limit=0.05)
u_3 = dsp.MeasurementUncertainty(expanded=0.2, k=2)

# Combine using root sum of squares (default)
combined_rss = u_1.combine_with(u_2, u_3)

# Combine using linear addition
combined_linear = u_1.combine_with(u_2, u_3, method='linear')

summary()

Get a summary of uncertainty values.

RETURNS DESCRIPTION
Dict[str, float | str]

Dictionary containing various uncertainty representations.

root_sum_squares(*args)

Calculate the root sum of squares of the given arguments.

PARAMETER DESCRIPTION
*args

Values to be summed up

TYPE: float or int DEFAULT: ()

RETURNS DESCRIPTION
float

The root sum of squares of the given arguments.

Notes

The root sum of squares is calculated as follows:

$$ \sqrt{x_1^2 + x_2^2 + ... + x_n^2}

$$

If only one argument is provided, it returns the argument itself.

RAISES DESCRIPTION
AssertionError

If no arguments are provided or if any argument is not of type int or float.

estimate_distribution(data, dists=DIST.COMMON)

First, the p-score is calculated by performing a Kolmogorov-Smirnov test to determine how well each distribution fits the data. Whatever has the highest P-score is considered the most accurate. This is because a higher p-score means the hypothesis is closest to reality.

PARAMETER DESCRIPTION
data

1d array of data for which a distribution is to be searched

TYPE: NumericSample1D

dists

Distributions to which the data may be subject. Only continuous distributions of scipy.stats are allowed, by default DIST.COMMON

TYPE: tuple of strings or rv_continous DEFAULT: COMMON

RETURNS DESCRIPTION
dist

A generic continous distribution class of best fit

TYPE: scipy.stats rv_continuous

p

The two-tailed p-value for the best fit

TYPE: float

shape_params

Estimates for any shape parameters (if applicable), followed by those for location and scale. For most random variables, shape statistics will be returned, but there are exceptions (e.g. norm). Can be used to generate values with the help of returned dist

TYPE: Tuple[float, ...]

estimate_kernel_density(data, *, stretch=1, height=None, base=0, n_points=DEFAULT.KD_SEQUENCE_LEN, margin=0.5)

Estimates the kernel density of data and returns values that are useful for a plot. If those values are plotted in combination with a histogram, set height as max value of the hostogram.

Kernel density estimation is a way to estimate the probability density function (PDF) of a random variable in a non-parametric way. The used gaussian_kde function of scipy.stats works for both uni-variate and multi-variate data. It includes automatic bandwidth determination. The estimation works best for a unimodal distribution; bimodal or multi-modal distributions tend to be oversmoothed.

PARAMETER DESCRIPTION
data

1-D array of datapoints to estimate from.

TYPE: NumericSample1D

stretch

Stretch the distribution estimate by the given factor, is only considered if "height" is None, by default 1

TYPE: float DEFAULT: 1

height

If the KDE curve is plotted in combination with other data (e.g. a histogram), you can use height to specify the height at the maximum point of the KDE curve. If this value is specified, the area under the curve will not be normalized, by default None

TYPE: float or None DEFAULT: None

base

The curve is shifted in the estimated direction by the given amount. This is usefull for violine plots, by default 0

TYPE: float DEFAULT: 0

n_points

Number of points the estimation and sequence should have, by default KD_SEQUENCE_LEN (defined in constants.py)

TYPE: int DEFAULT: KD_SEQUENCE_LEN

margin

Margin for the sequence as factor of data range (max - min ). If margin is 0, The two ends of the estimated density curve then show the minimum and maximum value. Default is 0.

TYPE: float DEFAULT: 0.5

RETURNS DESCRIPTION
sequence

Data points at regular intervals from input data minimum to maximum

TYPE: 1D array

estimation

Data points of kernel density estimation

TYPE: 1D array

estimate_kernel_density_2d(feature, target, *, n_points=DEFAULT.KD_SEQUENCE_LEN, margin=0.5)

Estimates the kernel density of 2 dimensional data and returns values that are useful for a contour plot.

Kernel density estimation is a way to estimate the probability density function (PDF) of a random variable in a non-parametric way. The used gaussian_kde function of scipy.stats works for both uni-variate and multi-variate data. It includes automatic bandwidth determination. The estimation works best for a unimodal distribution; bimodal or multi-modal distributions tend to be oversmoothed.

PARAMETER DESCRIPTION
feature

A one-dimensional array-like object containing the exogenous samples.

TYPE: NumericSample1D

target

A one-dimensional array-like object containing the endogenous samples.

TYPE: NumericSample1D

n_points

Number of points the estimation and sequence should have, by default KD_SEQUENCE_LEN (defined in constants.py)

TYPE: int DEFAULT: KD_SEQUENCE_LEN

margin

Margin for the sequence as factor of data range, by default 0.5.

TYPE: float DEFAULT: 0.5

RETURNS DESCRIPTION
feature_seq

Data points at regular intervals from input data minimum to maximum used for feature data

TYPE: 2D array

target_seq

Data points at regular intervals from input data minimum to maximum used for target data

TYPE: 2D array

estimation

Data points of kernel density estimation

TYPE: 2D array

RAISES DESCRIPTION
AssertionError:

If the provided data is empty, contains only zeros or all values are identical.

estimate_capability_confidence(process, *, kind='cpk', level=0.95, n_groups=1)

Calculates the confidence interval for the process capability index (Cp or Cpk) of a process.

This function is an extension of the cp_ci and cpk_ci functions. It instantiates a ProcessEstimator and then determines the confidence intervals using the Cp or Cpk values from the estimator.

PARAMETER DESCRIPTION
process

Process Estimator instance, is required to get the necessary process information such as capability indices and number of samples.

TYPE: ProcessEstimator

kind

Specifies whether to calculate the confidence interval for Cp or Cpk ('cp' or 'cpk'). Defaults is 'cpk'.

TYPE: Literal['cp', 'cpk] DEFAULT: 'cpk'

level

The desired confidence level for the interval, expressed as a decimal. Default is 0.95 (95% confidence).

TYPE: float DEFAULT: 0.95

n_groups

The number of groups for Bonferroni correction to adjust for multiple comparisons. Default is 1, indicating no correction

TYPE: int DEFAULT: 1

RETURNS DESCRIPTION
Tuple[float, float, float]:

A tuple containing the estimate, lower bound, and upper bound of the confidence interval for the specified process capability index.

RAISES DESCRIPTION
AssertionError:

If provided kind is not 'cp' or 'cpk'.

ValueError:

If no limit is provided or if only one limit is provided and kind is set to 'cp'.

estimate_resolution(data)

Estimate the resolution based on the length of the samples digits.

PARAMETER DESCRIPTION
data

1-D array of datapoints to estimate from.

TYPE: NumericSample1D

RETURNS DESCRIPTION
float

The estimated resolution.