Spread width

`daspi.plotlib.plotter.SpreadWidth(source, target, feature='', strategy='norm', agreement=6, possible_dists=DIST.COMMON, show_center=True, kind='mean', bars_same_color=False, skip_na=None, target_on_y=True, color=None, marker=None, ax=None, visible_spines=None, hide_axis=None, **kwds)` ¶

Bases: Errorbar

Class for creating plotters with error bars representing the spread width.

PARAMETER	DESCRIPTION
`source`	Pandas long format DataFrame containing the data source for the plot. TYPE: `pandas DataFrame`
`target`	Column name of the target variable for the plot. TYPE: `str`
`feature`	Column name of the feature variable for the plot, by default ''. TYPE: `str` DEFAULT: `''`
`strategy`	Which strategy should be used to determine the control limits (process spread): - `eval`: The strategy is determined according to the given evaluate function. If none is given, the internal `evaluate` method is used. - `fit`: First, the distribution that best represents the process data is searched for and then the agreed process spread is calculated - `norm`: it is assumed that the data is subject to normal distribution. The variation tolerance is then calculated as agreement * standard deviation - `data`: The quantiles for the process variation tolerance are read directly from the data. Default is 'norm'. TYPE: `(eval, fit, norm, data)` DEFAULT: `'eval'`
`agreement`	Specify the tolerated process variation for which the control limits are to be calculated. - If int, the spread is determined using the normal distribution agreementsigma, e.g. agreement = 6 -> 6sigma ~ covers 99.75 % of the data. The upper and lower permissible quantiles are then calculated from this. - If float, the value must be between 0 and 1.This value is then interpreted as the acceptable proportion for the spread, e.g. 0.9973 (which corresponds to ~ 6 sigma) Default is 6 because SixSigma ;-) TYPE: `int or float` DEFAULT: `6`
`possible_dists`	Distributions to which the data may be subject. Only continuous distributions of scipy.stats are allowed, by default `DIST.COMMON` TYPE: `tuple of strings or rv_continous` DEFAULT: `COMMON`
`show_center`	Flag indicating whether to show the center points (see `kind` option). Default is True. TYPE: `bool` DEFAULT: `True`
`kind`	The type of center to plot ('mean' or 'median'), by default 'mean'. TYPE: `Literal['mean', 'median']` DEFAULT: `'mean'`
`bars_same_color`	Flag indicating whether to use same color for error bars as markers for center. If False, the error bars are black, by default False TYPE: `bool` DEFAULT: `False`
`skip_na`	Flag indicating whether to skip missing values in the feature grouped data, by default None - None, no missing values are skipped - all', grouped data is skipped if all values are missing - any', grouped data is skipped if any value is missing TYPE: `Literal['none', 'all', 'any']` DEFAULT: `None`
`target_on_y`	Flag indicating whether the target variable is plotted on the y-axis, by default True. TYPE: `bool` DEFAULT: `True`
`color`	Color to be used to draw the artists. If None, the first color is taken from the color cycle, by default None. TYPE: `str \| None` DEFAULT: `None`
`marker`	The marker style for the center points. Available markers see: https://matplotlib.org/stable/api/markers_api.html, by default None TYPE: `str \| None` DEFAULT: `None`
`ax`	The axes object for the plot. If None, the current axes is fetched using `plt.gca()`. If no axes are available, a new one is created. Defaults to None. TYPE: `Axes \| None` DEFAULT: `None`
`visible_spines`	Specifies which spines are visible, the others are hidden. If 'none', no spines are visible. If None, the spines are drawn according to the stylesheet. Defaults to None. TYPE: `Literal['target', 'feature', 'none'] \| None` DEFAULT: `None`
`hide_axis`	Specifies which axes should be hidden. If None, both axes are displayed. Defaults to None. TYPE: `Literal['target', 'feature', 'both'] \| None` DEFAULT: `None`
`**kwds`	Additional keyword arguments that have no effect and are only used to catch further arguments that have no use here (occurs when this class is used within chart objects). DEFAULT: `{}`

Examples:

Apply to an existing Axes object:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from daspi import SpreadWidth, Beeswarm

fig, ax = plt.subplots()
df = pd.DataFrame(dict(
    x = ['first'] * 50 + ['second'] * 50 + ['third'] * 50,
    y = (
        list(np.random.normal(loc=3, scale=1, size=50))
        + list(np.random.normal(loc=4, scale=1, size=50))
        + list(np.random.normal(loc=2, scale=1, size=50)))))
swarm = Beeswarm(source=df, target='y', feature='x')
swarm(color=(0.3, )*4)
spread = SpreadWidth(
    source=df, target='y', feature='x', strategy='data', agreement=1.0, 
    kind='median', show_center=True, bars_same_color=True,
    ax=ax)
spread(kw_center=dict(s=30, marker='_'))
spread.label_feature_ticks()

Apply using the plot method of a DaSPi Chart object:

import numpy as np
import daspi as dsp
import pandas as pd

df = pd.DataFrame(dict(
    x = ['first'] * 50 + ['second'] * 50 + ['third'] * 50,
    y = (
        list(np.random.normal(loc=3, scale=1, size=50))
        + list(np.random.normal(loc=4, scale=1, size=50))
        + list(np.random.normal(loc=2, scale=1, size=50)))))
chart = dsp.SingleChart(
        source=df,
        target='y',
        feature='x',
        categorical_feature=True, # neded to label the feature tick labels
    ).plot(
        dsp.SpreadWidth,
        strategy='data',
        agreement=1.0,
        show_center=True,
        kind='median',
        bars_same_color=True,
        kw_call=dict(kw_center=dict(s=30, marker='_'))
    ).plot(
        dsp.Beeswarm,
        color=(0.3, ) * 4
    ).label() # neded to label the feature tick labels

Notes

Under the hood, the class daspi.statistics.estimation.Estimator is used. The error bar then corresponds to the control limits lcl and ucl` calculated with it.

If you want to display the minimum and maximum values (the range), set agreement to 1.0 (important: it must be a float) or to float('inf') and strategy to 'data'. This way, the control limits correspond to the minimum and maximum of the data.

`estimation` `instance-attribute` ¶

Estimator instance used for spread width and center estimation.

`strategy = strategy` `instance-attribute` ¶

Strategy for estimating the spread width.

`agreement = agreement` `instance-attribute` ¶

Agreement value for the spread width estimation.

`possible_dists = possible_dists` `instance-attribute` ¶

Tuple of possible distributions for the spread width estimation.

`marker` `property` ¶

Get the marker style for the center points if show_center is True, otherwise '' is returned. By default the marker is '_' if target_on_y is True, '|' otherwise (read-only).

`kind` `property` `writable` ¶

Get and set the type of location ('mean' or 'median') to plot.

RAISES	DESCRIPTION
`AssertionError`	If neither 'mean' or 'median' is given when setting `kind`.

`transform(feature_data, target_data)` ¶

Perform the transformation on the target data using the Estimator class and return the transformed data.

PARAMETER	DESCRIPTION
`feature_data`	Base location (offset) of feature axis coming from `feature_grouped` generator. TYPE: `float \| int`
`target_data`	Feature grouped target data used for transformation, coming from `feature_grouped` generator. TYPE: `pandas Series`

RETURNS	DESCRIPTION
`data`	The transformed data source for the plot. TYPE: `pandas DataFrame`

Spread width

daspi.plotlib.plotter.SpreadWidth(source, target, feature='', strategy='norm', agreement=6, possible_dists=DIST.COMMON, show_center=True, kind='mean', bars_same_color=False, skip_na=None, target_on_y=True, color=None, marker=None, ax=None, visible_spines=None, hide_axis=None, **kwds) ¶

estimation instance-attribute ¶

strategy = strategy instance-attribute ¶

agreement = agreement instance-attribute ¶

possible_dists = possible_dists instance-attribute ¶

marker property ¶

kind property writable ¶

transform(feature_data, target_data) ¶