Process estimator

`daspi.statistics.estimation.ProcessEstimator(samples, spec_limits, error_values=(), strategy='norm', agreement=6, possible_dists=DIST.COMMON, evaluate=None, nan_policy='omit')` ¶

Bases: LocationDispersionEstimator

An object for various statistical estimators. This class extends the estimator with process-specific statistics such as specification limits, Cp and Cpk values.

The attributes are calculated lazily. After the class is instantiated, all attributes are set to None. As soon as an attribute (actually Property) is called, the value is calculated and stored so that the calculation is only performed once.

PARAMETER	DESCRIPTION
`samples`	1D array of process data. TYPE: `NumericSample1D`
`spec_limits`	Specification limits for process data. TYPE: `SpecLimits`
`error_values`	If the process data may contain coded values for measurement errors or similar, they can be specified here, by default []. TYPE: `tuple of float` DEFAULT: `()`
`strategy`	Which strategy should be used to determine the control limits (process spread): - `eval`: The strategy is determined according to the given evaluate function. If none is given, the internal `evaluate` method is used. - `fit`: First, the distribution that best represents the process data is searched for and then the agreed process spread is calculated - norm: it is assumed that the data is subject to normal distribution. The variation tolerance is then calculated as agreement * standard deviation - data: The quantiles for the process variation tolerance are read directly from the data. by default 'norm' TYPE: `(eval, fit, norm, data)` DEFAULT: `'eval'`
`agreement`	Specify the tolerated process variation for which the control limits are to be calculated. - If int, the spread is determined using the normal distribution agreementσ, e.g. agreement = 6 -> 6σ ~ covers 99.75 % of the data. The upper and lower permissible quantiles are then calculated from this. - If float, the value must be between 0 and 1.This value is then interpreted as the acceptable proportion for the spread, e.g. 0.9973 (which corresponds to ~ 6 σ) by default 6 TYPE: `float or int` DEFAULT: `6`
`possible_dists`	Distributions to which the data may be subject. Only continuous distributions of scipy.stats are allowed, by default DIST.COMMON TYPE: `tuple of strings or rv_continous` DEFAULT: `COMMON`
`evaluate`	Provide a function that evaluates the `strategy`. If `strategy` is set to 'eval', this function is used to determine the strategy. The function should take the `samples` as input and return a string that corresponds to a valid strategy 'fit', 'norm', or 'data'. If not provided, the internal `evaluate` method is used. For more information on the `evaluate` method, see the class documentation. Default is None. TYPE: `Callable \| None` DEFAULT: `None`
`nan_policy`	How to handle NaN values in the samples. - 'propagate': NaN values are preserved in the analysis. - 'raise': Raises an error if NaN values are found. - 'omit': Omits NaN values from the analysis, default is 'omit'. TYPE: `(propagate, 'raise', omit)` DEFAULT: `'propagate'`

Examples:

You can get a comprehensive analysis of your process using the describe() method, which returns a pandas.DataFrame. This contains all important metrics, such as the Cp (if possible) and Cpk values:

import daspi as dsp

df = dsp.load_dataset('drop_card')
spec_limits = dsp.SpecLimits(0, float(df.loc[0, 'usl']))
target = 'distance'

drop_process = dsp.ProcessEstimator(
    samples=df[target],
    spec_limits=spec_limits)
print(drop_process.describe())

However, in this dataset, the cards were dropped in two different ways. To compare both methods, a DataFrame containing both process analyses can be created as follows:

method_mapping = {0: 'parallel', 1: 'perpendicular'}
samples_parallel = df[df['method']==method_mapping[0]][target]
samples_series = df[df['method']==method_mapping[1]][target]

drop_analysis = pd.concat([
    dsp.ProcessEstimator(samples_parallel, spec_limits).describe(),
    dsp.ProcessEstimator(samples_series, spec_limits).describe()],
    axis=1,
    ignore_index=True,
).rename(
    columns=method_mapping
)
print(drop_analysis)

You can get a detailed visual analysis with the precast chart daspi.plotlib.precast.ProcessCapabilityAnalysisCharts

RAISES	DESCRIPTION
`ValueError`	If NaN values are found in the samples and `nan_policy` is set to 'raise'.
`UserWarning`	If NaN values are found in the samples and `nan_policy` is set to 'omit' or 'propagate'. The warning indicates that NaN values will be omitted from the analysis or may lead to unexpected results.

`attrs_describe` `property` ¶

Get attribute names used for describe method (read-only).

`filtered` `property` ¶

Get the data without error values and no missing value (read-only).

`n_ok` `property` ¶

Get amount of OK-values (read-only).

`n_nok` `property` ¶

Get amount of NOK-values (read-only).

`ok` `property` ¶

Get amount of OK-values as percent (read-only).

`nok` `property` ¶

Get amount of NOK-values as percent (read-only).

`nok_norm` `property` ¶

Predict the amount NOK-values as percent based on the norm distribution (read-only).

`nok_fit` `property` ¶

Predict the amount NOK-values as percent based on the fitted distribution (read-only).

`n_errors` `property` ¶

Get amount of error values (read-only).

`errors` `property` ¶

Get the amount of error values (read-only).

`lsl` `property` ¶

Get the lower specification limit (read-only).

`usl` `property` ¶

Get the upper specification limit (rad-only).

`spec_limits` `property` `writable` ¶

Get and set the specification limits.

`control_limits` `property` ¶

Get lower and upper control limits (read-only).

`control_range` `property` ¶

Get the range (span) of the control limits (read-only).

`tolerance` `property` ¶

Get tolerance range. If one of the specification limits is not specified, inf is returned (read-only).

`cp` `property` ¶

Cp is a measure of process capability. Cp is the ratio of the specification width (usl - lsl) to the process variation (agreement*σ). The location is not taken into account by the Cp value. This value therefore only indicates the potential for the Cpk value. This value cannot be calculated unless an upper and lower specification limit is given. In this case, None is returned.

`cpl` `property` ¶

Cpl is a measure of process capability. It is the ratio of the distance between the process mean and the lower specification limit and the lower spread of the process. Returns inf if no lower specification limit is specified.

`cpu` `property` ¶

Cpu is a measure of process capability. It is the ratio of the distance between the process mean and the upper specification limit and the upper spread of the process. Returns inf if no upper specification limit is specified.

`cpk` `property` ¶

Estimates what the process is capable of producing, considering that the process mean may not be centered between the specification limits. It's calculated as the minimum of Cpl and Cpu. In general, higher Cpk values indicate a more capable process. Lower Cpk values indicate that the process may need improvement.

`Z` `property` ¶

The Sigma level Z is another process capability indicator alongside cp and cpk. It describes how many standard deviations can be placed between the mean value and the nearest tolerance limit of a process.

`Z_lt` `property` ¶

Statements about long-term capabilities can be derived from short-term capabilities using the σ level. The empirically determined value of 1.5 is subtracted from the σ level.

`mask_error()` ¶

Returns a boolean mask indicating which samples are considered errors.

A sample is marked as an error if its value is found in the predefined set of error values (_error_values).

RETURNS	DESCRIPTION
`Series[bool]`	A boolean Series where True indicates an erroneous sample.

`mask_nok()` ¶

Returns a boolean mask indicating which filtered samples are out of specification.

A sample is considered not OK (NOK) if it is less than or equal to the lower specification limit (lsl) or greater than or equal to the upper specification limit (usl).

RETURNS	DESCRIPTION
`Series[bool]`	A boolean Series where True indicates a sample that is out of spec.

Notes

This mask only checks for out-of-spec values and does not consider missing or erroneous samples. Therefore, it is not the exact inverse of mask_ok(), which also excludes missing and error values. To get the full set of invalid samples, combine this mask with mask_missing() and mask_error().

`mask_ok()` ¶

Returns a boolean mask indicating which samples are valid (OK).

A sample is considered OK if it is: - Not missing (i.e., not NaN) - Not an error (i.e., not in _error_values) - Within specification limits (lsl < value < usl)

RETURNS	DESCRIPTION
`Series[bool]`	A boolean Series where True indicates a valid sample.

Notes

This mask is the logical inverse of the union of mask_missing(), mask_error(), and mask_nok(). It ensures that only fully valid samples are marked as OK, whereas mask_nok() alone does not account for missing or erroneous values.

Process estimator

daspi.statistics.estimation.ProcessEstimator(samples, spec_limits, error_values=(), strategy='norm', agreement=6, possible_dists=DIST.COMMON, evaluate=None, nan_policy='omit') ¶

attrs_describe property ¶

filtered property ¶

n_ok property ¶

n_nok property ¶

ok property ¶

nok property ¶

nok_norm property ¶

nok_fit property ¶

n_errors property ¶

errors property ¶

lsl property ¶

usl property ¶

spec_limits property writable ¶

control_limits property ¶

control_range property ¶

tolerance property ¶

cp property ¶

cpl property ¶

cpu property ¶

cpk property ¶

Z property ¶

Z_lt property ¶

mask_error() ¶

mask_nok() ¶

mask_ok() ¶