Skip to content

Index

daspi.statistics.confidence

Confidence interval functions for common statistical measures.

This module provides two-sided confidence interval calculations for a range of statistics, covering both single-sample and two-sample scenarios. All interval functions return a three-element tuple in the form (point_estimate, lower_bound, upper_bound), making them straightforward to use in tables and plots.

Available functions

Single-sample intervals:

  • mean_ci – confidence interval for the sample mean (t-distribution)
  • median_ci – confidence interval for the sample median
  • variance_ci – confidence interval for the variance (χ²-distribution)
  • stdev_ci – confidence interval for the standard deviation
  • proportion_ci – confidence interval for a binomial proportion

Process-capability intervals:

  • cp_ci – confidence interval for the Cp process-capability index
  • cpk_ci – confidence interval for the Cpk process-capability index

Two-sample / difference intervals:

  • delta_mean_ci – confidence interval for the difference of two means
  • delta_variance_ci – confidence interval for the ratio of two variances
  • delta_stdev_ci – confidence interval for the ratio of two standard deviations
  • delta_proportions_ci – confidence interval for the difference of two proportions

Regression / model intervals:

  • fit_ci – confidence band around a fitted OLS regression line
  • prediction_ci – prediction band for individual future observations

Helpers / utilities:

  • sem – standard error of the mean
  • bonferroni_ci – group-wise confidence intervals with Bonferroni correction
  • confidence_to_alpha – convert a confidence level to the corresponding α
Notes

The Bonferroni correction adjusts each individual confidence level so that the family-wise error rate across n simultaneous intervals does not exceed the nominal α.

References

Comprehensive confidence intervals for Python developers: https://aegis4048.github.io/comprehensive_confidence_intervals_for_python_developers

mean_ci(sample, level=0.95, n_groups=1)

Two sided confidence interval for mean of data.

PARAMETER DESCRIPTION
sample

A one-dimensional array-like object containing the samples.

TYPE: NumericSample1D

level

confidence level, by default 0.95

TYPE: float in (0, 1) DEFAULT: 0.95

n_groups

Used for Bonferroni method. Amount of groups to adjust the alpha risk within each group, that the total risk is not exceeded, by default 1

TYPE: int DEFAULT: 1

RETURNS DESCRIPTION
x_bar

expected value

TYPE: float

lower

Lower confidence level

TYPE: float

upper

Upper confidence levell

TYPE: float

Notes

The underlying t.interval function assumes that the data follows a t-distribution. Additionally, this method assumes that the sample is representative of the population and that the data is independent and identically distributed.

median_ci(sample, level=0.95, n_groups=1)

Two sided confidence interval for median of data

PARAMETER DESCRIPTION
sample

A one-dimensional array-like object containing the samples.

TYPE: NumericSample1D

level

confidence level, by default 0.95

TYPE: float in (0, 1) DEFAULT: 0.95

n_groups

Used for Bonferroni method. Amount of groups to adjust the alpha risk within each group, that the total risk is not exceeded, by default 1

TYPE: int DEFAULT: 1

RETURNS DESCRIPTION
median

median of data

TYPE: float

lower

Lower confidence level

TYPE: float

upper

Upper confidence levell

TYPE: float

Notes

The underlying t.interval function assumes that the data follows a t-distribution. Additionally, this method assumes that the sample is representative of the population and that the data is independent and identically distributed.

variance_ci(sample, level=0.95, n_groups=1)

Two sided confidence interval for variance of data

PARAMETER DESCRIPTION
sample

A one-dimensional array-like object containing the samples.

TYPE: NumericSample1D

level

confidence level, by default 0.95

TYPE: float in (0, 1) DEFAULT: 0.95

n_groups

Used for Bonferroni method. Amount of groups to adjust the alpha risk within each group, that the total risk is not exceeded, by default 1

TYPE: int DEFAULT: 1

RETURNS DESCRIPTION
s2

variance of data

TYPE: float

lower

Lower confidence level

TYPE: float

upper

Upper confidence levell

TYPE: float

stdev_ci(sample, level=0.95, n_groups=1)

Two sided confidence interval for standard deviation of data

PARAMETER DESCRIPTION
sample

A one-dimensional array-like object containing the samples.

TYPE: NumericSample1D

level

confidence level, by default 0.95

TYPE: float in (0, 1) DEFAULT: 0.95

n_groups

Used for Bonferroni method. Amount of groups to adjust the alpha risk within each group, that the total risk is not exceeded, by default 1

TYPE: int DEFAULT: 1

RETURNS DESCRIPTION
s

variance of data

TYPE: float

lower

Lower confidence level

TYPE: float

upper

Upper confidence levell

TYPE: float

proportion_ci(events, observations, level=0.95, n_groups=1)

Confidence interval for a binomial proportion with a asymptotic normal approximation.

PARAMETER DESCRIPTION
events

Counted number of events.

TYPE: int

observations

Total number of observations.

TYPE: int

level

Confidence level, default 0.95

TYPE: float in (0, 1) DEFAULT: 0.95

n_groups

Used for Bonferroni method. Amount of groups to adjust the alpha risk within each group, that the total risk is not exceeded, by default 1

TYPE: int DEFAULT: 1

RETURNS DESCRIPTION
portion

Portion as ratio events/observations.

TYPE: float

lower, upper : float

The lower and upper confidence level with coverage approximately ci.

bonferroni_ci(data, target, feature, level=0.95, ci_func=stdev_ci, n_groups=None, name='midpoint')

Calculate confidence interval after bonferroni correction. The Bonferroni correction is a method to adjust the significance level alpha.

PARAMETER DESCRIPTION
data

data frame containing sample and feature data

TYPE: DataFrame

target

name of target sample data column

TYPE: str

feature

name of categorical feature. The confidence intervals are calculated separately for these groups

TYPE: str | List[str]

level

confidence level, default 0.95

TYPE: float in (0, 1) DEFAULT: 0.95

ci_func

function to calculate needed confidence interval that returns the values in order: midpoint, lower ci, upper ci

TYPE: (mean_ci, stdev_ci, variance_ci) DEFAULT: mean_ci

n_groups

Used for Bonferroni correction. Amount of groups to adjust the alpha risk within each group, that the total risk is not exceeded, If none is given, it calculates the number based on the given groups (ngroups attribute of groupby object), by default None

TYPE: int DEFAULT: None

name

name of midpoints, by default 'midpoint'

TYPE: str DEFAULT: 'midpoint'

RETURNS DESCRIPTION
data

data containing groups, midpoints and confidence limits

TYPE: DataFrame

Notes

The Bonferroni correction is always necessary if you carry out several "multiple" tests. In this case, the probability of the type I error for all tests together is no longer 5% (or 1%), but significantly more. This means that the risk that you will receive at least one significant result, even though there is no effect at all, is significantly increased with multiple tests. This is also referred to as alpha error accumulation or alpha inflation.

delta_mean_ci(sample1, sample2, level=0.95)

Two sided confidence interval for mean difference of two independent variables.

PARAMETER DESCRIPTION
sample1

A one-dimensional array-like object containing the first samples.

TYPE: NumericSample1D

sample2

A one-dimensional array-like object containing the second samples.

TYPE: NumericSample1D

level

confidence level between 0 and 1, by default 0.95

TYPE: float in (0, 1) DEFAULT: 0.95

RETURNS DESCRIPTION
delta

Difference of means of data

TYPE: float

lower

Lower confidence level

TYPE: float

upper

Upper confidence levell

TYPE: float

delta_variance_ci(sample1, sample2, level=0.95)

two sided confidence interval for variance difference of two independent variables.

PARAMETER DESCRIPTION
sample1

A one-dimensional array-like object containing the first sample.

TYPE: NumericSample1D

sample2

A one-dimensional array-like object containing the second sample.

TYPE: NumericSample1D

level

confidence level between 0 and 1, by default 0.95

TYPE: float in (0, 1) DEFAULT: 0.95

RETURNS DESCRIPTION
delta

difference of variance of data

TYPE: float

lower

Lower confidence level

TYPE: float

upper

Upper confidence levell

TYPE: float

Notes

This function is a ChatGPT solution and therefore does not guarantee that this solution is correct.

delta_proportions_ci(events1, observations1, events2, observations2, level=0.95)

Confidence intervals for comparing two independent proportions This assumes that we have two independent binomial sample.

PARAMETER DESCRIPTION
events1

Counted number of events of sample 1.

TYPE: int

observations1

Total number of observations of sample 1.

TYPE: int

events2

Counted number of events of sample 2.

TYPE: int

observations2

Total number of observations of sample 2.

TYPE: int

level

Confidence level, by default 0.95

TYPE: float in (0, 1) DEFAULT: 0.95

RETURNS DESCRIPTION
delta

Difference of variance of data

TYPE: float

lower

Lower confidence level

TYPE: float

upper

Upper confidence levell

TYPE: float

fit_ci(model, level=0.95)

calculate confidence interval fitted line. Applies to fitted WLS and OLS models, not to general GLS

PARAMETER DESCRIPTION
model

fitted OLS or WLS model

TYPE: statsmodels RegressionResults

level

confidence level, by default 0.95

TYPE: float in (0, 1) DEFAULT: 0.95

RETURNS DESCRIPTION
fitted

For coherence with the other functions, the fitted target samples are returned as one-dimensional numpy array,

TYPE: NDArray

lower

Lower confidence limits of fitting line as one-dimensional numpy array.

TYPE: NDArray

upper

Upper confidence limits of fitting line as one-dimensional numpy array.

TYPE: NDArray

Notes

Using hat_matrix to calculate fit_se only works for fitted values

This function is based on the summary_table function from the statsmodels.stats.outliers_influence module, see: https://www.statsmodels.org/dev/_modules/statsmodels/stats/outliers_influence.html

prediction_ci(model, level=0.95)

calculate confidence interval for prediction and to observe outliers. Applies to fitted WLS and OLS models, not to general GLS.

PARAMETER DESCRIPTION
model

fitted OLS or WLS model

TYPE: statsmodels RegressionResults

level

confidence level, by default 0.95

TYPE: float in (0, 1) DEFAULT: 0.95

RETURNS DESCRIPTION
fitted

For coherence with the other functions, the fitted target samples are returned as one-dimensional numpy array,

TYPE: NDArray

lower

Lower confidence limits of prediction as one-dimensional numpy array.

TYPE: NDArray

upper

Upper confidence limits of prediction as one-dimensional numpy array.

TYPE: NDArray

confidence_to_alpha(confidence_level, two_sided=True, n_groups=1)

Calculate significance level as alpha risk by given confidence level

PARAMETER DESCRIPTION
confidence_level

level of confidence interval

TYPE: float in (0, 1)

two_sided

True if alpha is to be calculated for a two-sided confidence interval, by default True

TYPE: bool DEFAULT: True

n_groups

Used for Bonferroni method. Number of groups to adjust the alpha risk within each group, that the total risk is not exceeded, by default 1

TYPE: int DEFAULT: 1

RETURNS DESCRIPTION
alpha

significance level as alpha risk

TYPE: float