Process estimator
daspi.statistics.estimation.ProcessEstimator(samples, spec_limits, error_values=(), strategy='norm', agreement=6, possible_dists=DIST.COMMON, evaluate=None, nan_policy='omit')
¶
Bases: LocationDispersionEstimator
An object for various statistical estimators. This class extends the estimator with process-specific statistics such as specification limits, Cp and Cpk values.
The attributes are calculated lazily. After the class is instantiated, all attributes are set to None. As soon as an attribute (actually Property) is called, the value is calculated and stored so that the calculation is only performed once.
| PARAMETER | DESCRIPTION |
|---|---|
samples
|
1D array of process data.
TYPE:
|
spec_limits
|
Specification limits for process data.
TYPE:
|
error_values
|
If the process data may contain coded values for measurement errors or similar, they can be specified here, by default [].
TYPE:
|
strategy
|
Which strategy should be used to determine the control
limits (process spread):
-
TYPE:
|
agreement
|
Specify the tolerated process variation for which the control limits are to be calculated. - If int, the spread is determined using the normal distribution agreementσ, e.g. agreement = 6 -> 6σ ~ covers 99.75 % of the data. The upper and lower permissible quantiles are then calculated from this. - If float, the value must be between 0 and 1.This value is then interpreted as the acceptable proportion for the spread, e.g. 0.9973 (which corresponds to ~ 6 σ) by default 6
TYPE:
|
possible_dists
|
Distributions to which the data may be subject. Only continuous distributions of scipy.stats are allowed, by default DIST.COMMON
TYPE:
|
evaluate
|
Provide a function that evaluates the
TYPE:
|
nan_policy
|
How to handle NaN values in the samples. - 'propagate': NaN values are preserved in the analysis. - 'raise': Raises an error if NaN values are found. - 'omit': Omits NaN values from the analysis, default is 'omit'.
TYPE:
|
Examples:
You can get a comprehensive analysis of your process using the
describe() method, which returns a pandas.DataFrame. This
contains all important metrics, such as the Cp (if possible) and
Cpk values:
import daspi as dsp
df = dsp.load_dataset('drop_card')
spec_limits = dsp.SpecLimits(0, float(df.loc[0, 'usl']))
target = 'distance'
drop_process = dsp.ProcessEstimator(
samples=df[target],
spec_limits=spec_limits)
print(drop_process.describe())
However, in this dataset, the cards were dropped in two different ways. To compare both methods, a DataFrame containing both process analyses can be created as follows:
method_mapping = {0: 'parallel', 1: 'perpendicular'}
samples_parallel = df[df['method']==method_mapping[0]][target]
samples_series = df[df['method']==method_mapping[1]][target]
drop_analysis = pd.concat([
dsp.ProcessEstimator(samples_parallel, spec_limits).describe(),
dsp.ProcessEstimator(samples_series, spec_limits).describe()],
axis=1,
ignore_index=True,
).rename(
columns=method_mapping
)
print(drop_analysis)
You can get a detailed visual analysis with the precast chart
daspi.plotlib.precast.ProcessCapabilityAnalysisCharts
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If NaN values are found in the samples and |
UserWarning
|
If NaN values are found in the samples and |
attrs_describe
property
¶
Get attribute names used for describe method (read-only).
filtered
property
¶
Get the data without error values and no missing value (read-only).
n_ok
property
¶
Get amount of OK-values (read-only).
n_nok
property
¶
Get amount of NOK-values (read-only).
ok
property
¶
Get amount of OK-values as percent (read-only).
nok
property
¶
Get amount of NOK-values as percent (read-only).
nok_norm
property
¶
Predict the amount NOK-values as percent based on the norm distribution (read-only).
nok_fit
property
¶
Predict the amount NOK-values as percent based on the fitted distribution (read-only).
n_errors
property
¶
Get amount of error values (read-only).
errors
property
¶
Get the amount of error values (read-only).
lsl
property
¶
Get the lower specification limit (read-only).
usl
property
¶
Get the upper specification limit (rad-only).
spec_limits
property
writable
¶
Get and set the specification limits.
control_limits
property
¶
Get lower and upper control limits (read-only).
control_range
property
¶
Get the range (span) of the control limits (read-only).
tolerance
property
¶
Get tolerance range. If one of the specification limits is not specified, inf is returned (read-only).
cp
property
¶
Cp is a measure of process capability. Cp is the ratio of the specification width (usl - lsl) to the process variation (agreement*σ). The location is not taken into account by the Cp value. This value therefore only indicates the potential for the Cpk value. This value cannot be calculated unless an upper and lower specification limit is given. In this case, None is returned.
cpl
property
¶
Cpl is a measure of process capability. It is the ratio of the distance between the process mean and the lower specification limit and the lower spread of the process. Returns inf if no lower specification limit is specified.
cpu
property
¶
Cpu is a measure of process capability. It is the ratio of the distance between the process mean and the upper specification limit and the upper spread of the process. Returns inf if no upper specification limit is specified.
cpk
property
¶
Estimates what the process is capable of producing, considering that the process mean may not be centered between the specification limits. It's calculated as the minimum of Cpl and Cpu. In general, higher Cpk values indicate a more capable process. Lower Cpk values indicate that the process may need improvement.
Z
property
¶
The Sigma level Z is another process capability indicator alongside cp and cpk. It describes how many standard deviations can be placed between the mean value and the nearest tolerance limit of a process.
Z_lt
property
¶
Statements about long-term capabilities can be derived from short-term capabilities using the σ level. The empirically determined value of 1.5 is subtracted from the σ level.
mask_error()
¶
Returns a boolean mask indicating which samples are considered errors.
A sample is marked as an error if its value is found in the
predefined set of error values (_error_values).
| RETURNS | DESCRIPTION |
|---|---|
Series[bool]
|
A boolean Series where True indicates an erroneous sample. |
mask_nok()
¶
Returns a boolean mask indicating which filtered samples are out of specification.
A sample is considered not OK (NOK) if it is less than or equal
to the lower specification limit (lsl) or greater than or
equal to the upper specification limit (usl).
| RETURNS | DESCRIPTION |
|---|---|
Series[bool]
|
A boolean Series where True indicates a sample that is out of spec. |
Notes
This mask only checks for out-of-spec values and does not
consider missing or erroneous samples. Therefore, it is not the
exact inverse of mask_ok(), which also excludes missing and
error values. To get the full set of invalid samples, combine
this mask with mask_missing() and mask_error().
mask_ok()
¶
Returns a boolean mask indicating which samples are valid (OK).
A sample is considered OK if it is:
- Not missing (i.e., not NaN)
- Not an error (i.e., not in _error_values)
- Within specification limits (lsl < value < usl)
| RETURNS | DESCRIPTION |
|---|---|
Series[bool]
|
A boolean Series where True indicates a valid sample. |
Notes
This mask is the logical inverse of the union of
mask_missing(), mask_error(), and mask_nok(). It ensures
that only fully valid samples are marked as OK, whereas
mask_nok() alone does not account for missing or erroneous
values.