Proportion test

`daspi.plotlib.plotter.ProportionTest(source, target, events, observations, n_groups=1, feature='', method='sum', show_center=True, bars_same_color=False, target_on_y=True, confidence_level=0.95, color=None, marker=None, ax=None, visible_spines=None, hide_axis=None, **kwds)` ¶

Bases: ConfidenceInterval

Class for creating plotters with error bars representing confidence intervals for proportion (events/observation).

This class is specifically designed for testing the statistical significance of proportions. It uses confidence intervals to visually represent the uncertainty in the variation estimates and allows for a quick assessment of whether the intervals overlap or not.

PARAMETER	DESCRIPTION
`source`	Pandas long format DataFrame containing the data source for the plot. TYPE: `pandas DataFrame`
`target`	Column name to use for the target variable. If falsy, the name will be formed from the specified `events` and `observations` with a "/" character in between. TYPE: `str`
`n_groups`	Number of groups (variable combinations) for the Bonferroni adjustment. A good way to do this is to pass `df.groupby(list_of_variates).ngroups`, where `list_of_variates` is a list containing all the categorical columns in the source that will be used for the chart to split the data into groups (hue, categorical features, etc.). Specify 1 to not do a Bonferroni adjustment. Default is 1 TYPE: `int` DEFAULT: `1`
`events`	Column name containing the values of counted events for each feature. TYPE: `str`
`observations`	Column name containing the values of counted observations for each feature. TYPE: `str`
`feature`	Column name of the feature variable for the plot, by default ''. TYPE: `str` DEFAULT: `''`
`method`	A pandas Series method to use for aggregating target values within each feature level. This method is only required if there is more than one value for number of observations and number of events for each factor level. Otherwise it is ignored, by default 'sum'. TYPE: `Literal['sum', 'mean', 'median']` DEFAULT: `'sum'`
`show_center`	Flag indicating whether to show the center points, by default True. TYPE: `bool` DEFAULT: `True`
`bars_same_color`	Flag indicating whether to use same color for error bars as markers for center. If False, the error bars are black, by default False TYPE: `bool` DEFAULT: `False`
`target_on_y`	Flag indicating whether the target variable is plotted on the y-axis, by default True. TYPE: `bool` DEFAULT: `True`
`confidence_level`	Confidence level for the confidence intervals, by default 0.95. TYPE: `float` DEFAULT: `0.95`
`color`	Color to be used to draw the artists. If None, the first color is taken from the color cycle, by default None. TYPE: `str \| None` DEFAULT: `None`
`marker`	The marker style for the center points. Available markers see: https://matplotlib.org/stable/api/markers_api.html, by default None TYPE: `str \| None` DEFAULT: `None`
`ax`	The axes object for the plot. If None, the current axes is fetched using `plt.gca()`. If no axes are available, a new one is created. Defaults to None. TYPE: `Axes \| None` DEFAULT: `None`
`visible_spines`	Specifies which spines are visible, the others are hidden. If 'none', no spines are visible. If None, the spines are drawn according to the stylesheet. Defaults to None. TYPE: `Literal['target', 'feature', 'none'] \| None` DEFAULT: `None`
`hide_axis`	Specifies which axes should be hidden. If None, both axes are displayed. Defaults to None. TYPE: `Literal['target', 'feature', 'both'] \| None` DEFAULT: `None`
`**kwds`	Additional keyword arguments that have no effect and are only used to catch further arguments that have no use here (occurs when this class is used within chart objects). DEFAULT: `{}`

Examples:

Apply to an existing Axes object:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from daspi import ProportionTest, ProcessEstimator, SpecLimits, Bar

fig, ax = plt.subplots()
df = pd.DataFrame(dict(
    x = ['first'] * 100 + ['second'] * 100 + ['third'] * 100,
    y = (list(np.random.normal(loc=3, scale=1, size=100))
        + list(np.random.normal(loc=4, scale=1, size=100))
        + list(np.random.normal(loc=2, scale=1, size=100)))))
spec_limits = SpecLimits(upper=4.3)

# Create data that records how many are out of specification
data = pd.DataFrame()
for name, group in df.groupby('x'):
    y = ProcessEstimator(group['y'], spec_limits=spec_limits)
    temp = pd.DataFrame(dict(
        x = [name],
        proportion = y.n_nok / y.n_samples,
        events = y.n_nok,
        observations = y.n_samples))
    data = pd.concat([data, temp], ignore_index=True, axis=0)

# Now plot it in combination with a bar chart
bar = Bar(source=data, target='proportion', feature='x', ax=ax)
bar()
test = ProportionTest(
    source=data, target='proportion', feature='x', events='events', 
    observations='observations', show_center=False, n_groups=data.x.nunique(), 
    confidence_level=0.95, bars_same_color=True,)
test()
test.label_feature_ticks()
test.target_as_percentage()

Apply using the plot method of a DaSPi Chart object:

import numpy as np
import daspi as dsp
import pandas as pd

df = pd.DataFrame(dict(
    x = ['first'] * 100 + ['second'] * 100 + ['third'] * 100,
    y = (list(np.random.normal(loc=3, scale=1, size=100))
        + list(np.random.normal(loc=4, scale=1, size=100))
        + list(np.random.normal(loc=2, scale=1, size=100)))))
spec_limits = dsp.SpecLimits(upper=4.3)

# Create data that records how many are out of specification
data = pd.DataFrame()
for name, group in df.groupby('x'):
    y = ProcessEstimator(group['y'], spec_limits=spec_limits)
    temp = pd.DataFrame(dict(
        x = [name],
        proportion = y.n_nok / y.n_samples,
        events = y.n_nok,
        observations = y.n_samples))
    data = pd.concat([data, temp], ignore_index=True, axis=0)

# Now plot it in combination with a bar chart
chart = dsp.SingleChart(
        source=data,
        target='proportion',
        feature='x',
        categorical_feature=True, # neded to label the feature tick labels
    ).plot(
        dsp.ProportionTest,
        events='events', 
        observations='observations',
        show_center=False,
        n_groups=data.x.nunique(), 
        confidence_level=0.95,
        bars_same_color=True,
    ).plot(
        dsp.Bar,
    ).label() # neded to label the feature tick labels

Notes

This class is a bit of a hack, as it creates its own target variable using the ratio of events / observations. This allows it to visualize proportions directly, even if a target column is not explicitly provided.

A recommended and robust approach is to precompute the proportion yourself and pass it as the target variable. This gives you full control over how the proportion is calculated and ensures compatibility with other plotters or axes.

See the Examples section for a demonstration of this approach, where the proportion is computed manually and passed to both the Bar and ProportionTest plotters. This method is especially useful when combining multiple visualizations or when working with pre-aggregated data.

`method = method` `instance-attribute` ¶

The provided Pandas Series method for aggregating events and observations (if there are multiple) per feature level.

`transform(feature_data, target_data)` ¶

Perform the transformation on the target data by using the given function `ci_func' and return the transformed data.

PARAMETER	DESCRIPTION
`feature_data`	Base location (offset) of feature axis coming from `feature_grouped` generator. TYPE: `float \| int`
`target_data`	Feature grouped target data used for transformation, coming from `feature_grouped` generator. TYPE: `pandas Series`

RETURNS	DESCRIPTION
`data`	The transformed data source for the plot. TYPE: `pandas DataFrame`

Proportion test

daspi.plotlib.plotter.ProportionTest(source, target, events, observations, n_groups=1, feature='', method='sum', show_center=True, bars_same_color=False, target_on_y=True, confidence_level=0.95, color=None, marker=None, ax=None, visible_spines=None, hide_axis=None, **kwds) ¶

method = method instance-attribute ¶

transform(feature_data, target_data) ¶

`daspi.plotlib.plotter.ProportionTest(source, target, events, observations, n_groups=1, feature='', method='sum', show_center=True, bars_same_color=False, target_on_y=True, confidence_level=0.95, color=None, marker=None, ax=None, visible_spines=None, hide_axis=None, **kwds)` ¶

`method = method` `instance-attribute` ¶

`transform(feature_data, target_data)` ¶