Skip to content

Process estimator

daspi.statistics.estimation.ProcessEstimator(samples, spec_limits, error_values=(), strategy='norm', agreement=6, possible_dists=DIST.COMMON, evaluate=None, nan_policy='omit')

Bases: LocationDispersionEstimator

An object for various statistical estimators. This class extends the estimator with process-specific statistics such as specification limits, Cp and Cpk values.

The attributes are calculated lazily. After the class is instantiated, all attributes are set to None. As soon as an attribute (actually Property) is called, the value is calculated and stored so that the calculation is only performed once.

PARAMETER DESCRIPTION
samples

1D array of process data.

TYPE: NumericSample1D

spec_limits

Specification limits for process data.

TYPE: SpecLimits

error_values

If the process data may contain coded values for measurement errors or similar, they can be specified here, by default [].

TYPE: tuple of float DEFAULT: ()

strategy

Which strategy should be used to determine the control limits (process spread): - eval: The strategy is determined according to the given evaluate function. If none is given, the internal evaluate method is used. - fit: First, the distribution that best represents the process data is searched for and then the agreed process spread is calculated - norm: it is assumed that the data is subject to normal distribution. The variation tolerance is then calculated as agreement * standard deviation - data: The quantiles for the process variation tolerance are read directly from the data. by default 'norm'

TYPE: (eval, fit, norm, data) DEFAULT: 'eval'

agreement

Specify the tolerated process variation for which the control limits are to be calculated. - If int, the spread is determined using the normal distribution agreementσ, e.g. agreement = 6 -> 6σ ~ covers 99.75 % of the data. The upper and lower permissible quantiles are then calculated from this. - If float, the value must be between 0 and 1.This value is then interpreted as the acceptable proportion for the spread, e.g. 0.9973 (which corresponds to ~ 6 σ) by default 6

TYPE: float or int DEFAULT: 6

possible_dists

Distributions to which the data may be subject. Only continuous distributions of scipy.stats are allowed, by default DIST.COMMON

TYPE: tuple of strings or rv_continous DEFAULT: COMMON

evaluate

Provide a function that evaluates the strategy. If strategy is set to 'eval', this function is used to determine the strategy. The function should take the samples as input and return a string that corresponds to a valid strategy 'fit', 'norm', or 'data'. If not provided, the internal evaluate method is used. For more information on the evaluate method, see the class documentation. Default is None.

TYPE: Callable | None DEFAULT: None

nan_policy

How to handle NaN values in the samples. - 'propagate': NaN values are preserved in the analysis. - 'raise': Raises an error if NaN values are found. - 'omit': Omits NaN values from the analysis, default is 'omit'.

TYPE: (propagate, 'raise', omit) DEFAULT: 'propagate'

Examples:

You can get a comprehensive analysis of your process using the describe() method, which returns a pandas.DataFrame. This contains all important metrics, such as the Cp (if possible) and Cpk values:

import daspi as dsp

df = dsp.load_dataset('drop_card')
spec_limits = dsp.SpecLimits(0, float(df.loc[0, 'usl']))
target = 'distance'

drop_process = dsp.ProcessEstimator(
    samples=df[target],
    spec_limits=spec_limits)
print(drop_process.describe())

However, in this dataset, the cards were dropped in two different ways. To compare both methods, a DataFrame containing both process analyses can be created as follows:

method_mapping = {0: 'parallel', 1: 'perpendicular'}
samples_parallel = df[df['method']==method_mapping[0]][target]
samples_series = df[df['method']==method_mapping[1]][target]

drop_analysis = pd.concat([
    dsp.ProcessEstimator(samples_parallel, spec_limits).describe(),
    dsp.ProcessEstimator(samples_series, spec_limits).describe()],
    axis=1,
    ignore_index=True,
).rename(
    columns=method_mapping
)
print(drop_analysis)

You can get a detailed visual analysis with the precast chart daspi.plotlib.precast.ProcessCapabilityAnalysisCharts

RAISES DESCRIPTION
ValueError

If NaN values are found in the samples and nan_policy is set to 'raise'.

UserWarning

If NaN values are found in the samples and nan_policy is set to 'omit' or 'propagate'. The warning indicates that NaN values will be omitted from the analysis or may lead to unexpected results.

attrs_describe property

Get attribute names used for describe method (read-only).

filtered property

Get the data without error values and no missing value (read-only).

n_ok property

Get amount of OK-values (read-only).

n_nok property

Get amount of NOK-values (read-only).

ok property

Get amount of OK-values as percent (read-only).

nok property

Get amount of NOK-values as percent (read-only).

nok_norm property

Predict the amount NOK-values as percent based on the norm distribution (read-only).

nok_fit property

Predict the amount NOK-values as percent based on the fitted distribution (read-only).

n_errors property

Get amount of error values (read-only).

errors property

Get the amount of error values (read-only).

lsl property

Get the lower specification limit (read-only).

usl property

Get the upper specification limit (rad-only).

spec_limits property writable

Get and set the specification limits.

control_limits property

Get lower and upper control limits (read-only).

control_range property

Get the range (span) of the control limits (read-only).

tolerance property

Get tolerance range. If one of the specification limits is not specified, inf is returned (read-only).

cp property

Cp is a measure of process capability. Cp is the ratio of the specification width (usl - lsl) to the process variation (agreement*σ). The location is not taken into account by the Cp value. This value therefore only indicates the potential for the Cpk value. This value cannot be calculated unless an upper and lower specification limit is given. In this case, None is returned.

cpl property

Cpl is a measure of process capability. It is the ratio of the distance between the process mean and the lower specification limit and the lower spread of the process. Returns inf if no lower specification limit is specified.

cpu property

Cpu is a measure of process capability. It is the ratio of the distance between the process mean and the upper specification limit and the upper spread of the process. Returns inf if no upper specification limit is specified.

cpk property

Estimates what the process is capable of producing, considering that the process mean may not be centered between the specification limits. It's calculated as the minimum of Cpl and Cpu. In general, higher Cpk values indicate a more capable process. Lower Cpk values indicate that the process may need improvement.

Z property

The Sigma level Z is another process capability indicator alongside cp and cpk. It describes how many standard deviations can be placed between the mean value and the nearest tolerance limit of a process.

Z_lt property

Statements about long-term capabilities can be derived from short-term capabilities using the σ level. The empirically determined value of 1.5 is subtracted from the σ level.

mask_error()

Returns a boolean mask indicating which samples are considered errors.

A sample is marked as an error if its value is found in the predefined set of error values (_error_values).

RETURNS DESCRIPTION
Series[bool]

A boolean Series where True indicates an erroneous sample.

mask_nok()

Returns a boolean mask indicating which filtered samples are out of specification.

A sample is considered not OK (NOK) if it is less than or equal to the lower specification limit (lsl) or greater than or equal to the upper specification limit (usl).

RETURNS DESCRIPTION
Series[bool]

A boolean Series where True indicates a sample that is out of spec.

Notes

This mask only checks for out-of-spec values and does not consider missing or erroneous samples. Therefore, it is not the exact inverse of mask_ok(), which also excludes missing and error values. To get the full set of invalid samples, combine this mask with mask_missing() and mask_error().

mask_ok()

Returns a boolean mask indicating which samples are valid (OK).

A sample is considered OK if it is: - Not missing (i.e., not NaN) - Not an error (i.e., not in _error_values) - Within specification limits (lsl < value < usl)

RETURNS DESCRIPTION
Series[bool]

A boolean Series where True indicates a valid sample.

Notes

This mask is the logical inverse of the union of mask_missing(), mask_error(), and mask_nok(). It ensures that only fully valid samples are marked as OK, whereas mask_nok() alone does not account for missing or erroneous values.