Proportion test
daspi.plotlib.plotter.ProportionTest(source, target, events, observations, n_groups=1, feature='', method='sum', show_center=True, bars_same_color=False, target_on_y=True, confidence_level=0.95, color=None, marker=None, ax=None, visible_spines=None, hide_axis=None, **kwds)
¶
Bases: ConfidenceInterval
Class for creating plotters with error bars representing confidence intervals for proportion (events/observation).
This class is specifically designed for testing the statistical significance of proportions. It uses confidence intervals to visually represent the uncertainty in the variation estimates and allows for a quick assessment of whether the intervals overlap or not.
| PARAMETER | DESCRIPTION |
|---|---|
source
|
Pandas long format DataFrame containing the data source for the plot.
TYPE:
|
target
|
Column name to use for the target variable. If falsy, the name
will be formed from the specified
TYPE:
|
n_groups
|
Number of groups (variable combinations) for the Bonferroni
adjustment. A good way to do this is to pass
TYPE:
|
events
|
Column name containing the values of counted events for each feature.
TYPE:
|
observations
|
Column name containing the values of counted observations for each feature.
TYPE:
|
feature
|
Column name of the feature variable for the plot, by default ''.
TYPE:
|
method
|
A pandas Series method to use for aggregating target values within each feature level. This method is only required if there is more than one value for number of observations and number of events for each factor level. Otherwise it is ignored, by default 'sum'.
TYPE:
|
show_center
|
Flag indicating whether to show the center points, by default True.
TYPE:
|
bars_same_color
|
Flag indicating whether to use same color for error bars as markers for center. If False, the error bars are black, by default False
TYPE:
|
target_on_y
|
Flag indicating whether the target variable is plotted on the y-axis, by default True.
TYPE:
|
confidence_level
|
Confidence level for the confidence intervals, by default 0.95.
TYPE:
|
color
|
Color to be used to draw the artists. If None, the first color is taken from the color cycle, by default None.
TYPE:
|
marker
|
The marker style for the center points. Available markers see: https://matplotlib.org/stable/api/markers_api.html, by default None
TYPE:
|
ax
|
The axes object for the plot. If None, the current axes is
fetched using
TYPE:
|
visible_spines
|
Specifies which spines are visible, the others are hidden. If 'none', no spines are visible. If None, the spines are drawn according to the stylesheet. Defaults to None.
TYPE:
|
hide_axis
|
Specifies which axes should be hidden. If None, both axes are displayed. Defaults to None.
TYPE:
|
**kwds
|
Additional keyword arguments that have no effect and are only used to catch further arguments that have no use here (occurs when this class is used within chart objects).
DEFAULT:
|
Examples:
Apply to an existing Axes object:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from daspi import ProportionTest, ProcessEstimator, SpecLimits, Bar
fig, ax = plt.subplots()
df = pd.DataFrame(dict(
x = ['first'] * 100 + ['second'] * 100 + ['third'] * 100,
y = (list(np.random.normal(loc=3, scale=1, size=100))
+ list(np.random.normal(loc=4, scale=1, size=100))
+ list(np.random.normal(loc=2, scale=1, size=100)))))
spec_limits = SpecLimits(upper=4.3)
# Create data that records how many are out of specification
data = pd.DataFrame()
for name, group in df.groupby('x'):
y = ProcessEstimator(group['y'], spec_limits=spec_limits)
temp = pd.DataFrame(dict(
x = [name],
proportion = y.n_nok / y.n_samples,
events = y.n_nok,
observations = y.n_samples))
data = pd.concat([data, temp], ignore_index=True, axis=0)
# Now plot it in combination with a bar chart
bar = Bar(source=data, target='proportion', feature='x', ax=ax)
bar()
test = ProportionTest(
source=data, target='proportion', feature='x', events='events',
observations='observations', show_center=False, n_groups=data.x.nunique(),
confidence_level=0.95, bars_same_color=True,)
test()
test.label_feature_ticks()
test.target_as_percentage()
Apply using the plot method of a DaSPi Chart object:
import numpy as np
import daspi as dsp
import pandas as pd
df = pd.DataFrame(dict(
x = ['first'] * 100 + ['second'] * 100 + ['third'] * 100,
y = (list(np.random.normal(loc=3, scale=1, size=100))
+ list(np.random.normal(loc=4, scale=1, size=100))
+ list(np.random.normal(loc=2, scale=1, size=100)))))
spec_limits = dsp.SpecLimits(upper=4.3)
# Create data that records how many are out of specification
data = pd.DataFrame()
for name, group in df.groupby('x'):
y = ProcessEstimator(group['y'], spec_limits=spec_limits)
temp = pd.DataFrame(dict(
x = [name],
proportion = y.n_nok / y.n_samples,
events = y.n_nok,
observations = y.n_samples))
data = pd.concat([data, temp], ignore_index=True, axis=0)
# Now plot it in combination with a bar chart
chart = dsp.SingleChart(
source=data,
target='proportion',
feature='x',
categorical_feature=True, # neded to label the feature tick labels
).plot(
dsp.ProportionTest,
events='events',
observations='observations',
show_center=False,
n_groups=data.x.nunique(),
confidence_level=0.95,
bars_same_color=True,
).plot(
dsp.Bar,
).label() # neded to label the feature tick labels
Notes
This class is a bit of a hack, as it creates its own target variable using the ratio of events / observations. This allows it to visualize proportions directly, even if a target column is not explicitly provided.
A recommended and robust approach is to precompute the proportion yourself and pass it as the target variable. This gives you full control over how the proportion is calculated and ensures compatibility with other plotters or axes.
See the Examples section for a demonstration of this approach, where the proportion is computed manually and passed to both the Bar and ProportionTest plotters. This method is especially useful when combining multiple visualizations or when working with pre-aggregated data.
method = method
instance-attribute
¶
The provided Pandas Series method for aggregating events and observations (if there are multiple) per feature level.
transform(feature_data, target_data)
¶
Perform the transformation on the target data by using the given function `ci_func' and return the transformed data.
| PARAMETER | DESCRIPTION |
|---|---|
feature_data
|
Base location (offset) of feature axis coming from
TYPE:
|
target_data
|
Feature grouped target data used for transformation, coming
from
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
data
|
The transformed data source for the plot.
TYPE:
|