Skip to content

Spread width

daspi.plotlib.plotter.SpreadWidth(source, target, feature='', strategy='norm', agreement=6, possible_dists=DIST.COMMON, show_center=True, kind='mean', bars_same_color=False, skip_na=None, target_on_y=True, color=None, marker=None, ax=None, visible_spines=None, hide_axis=None, **kwds)

Bases: Errorbar

Class for creating plotters with error bars representing the spread width.

PARAMETER DESCRIPTION
source

Pandas long format DataFrame containing the data source for the plot.

TYPE: pandas DataFrame

target

Column name of the target variable for the plot.

TYPE: str

feature

Column name of the feature variable for the plot, by default ''.

TYPE: str DEFAULT: ''

strategy

Which strategy should be used to determine the control limits (process spread): - eval: The strategy is determined according to the given evaluate function. If none is given, the internal evaluate method is used. - fit: First, the distribution that best represents the process data is searched for and then the agreed process spread is calculated - norm: it is assumed that the data is subject to normal distribution. The variation tolerance is then calculated as agreement * standard deviation - data: The quantiles for the process variation tolerance are read directly from the data.

Default is 'norm'.

TYPE: (eval, fit, norm, data) DEFAULT: 'eval'

agreement

Specify the tolerated process variation for which the control limits are to be calculated. - If int, the spread is determined using the normal distribution agreementsigma, e.g. agreement = 6 -> 6sigma ~ covers 99.75 % of the data. The upper and lower permissible quantiles are then calculated from this. - If float, the value must be between 0 and 1.This value is then interpreted as the acceptable proportion for the spread, e.g. 0.9973 (which corresponds to ~ 6 sigma)

Default is 6 because SixSigma ;-)

TYPE: int or float DEFAULT: 6

possible_dists

Distributions to which the data may be subject. Only continuous distributions of scipy.stats are allowed, by default DIST.COMMON

TYPE: tuple of strings or rv_continous DEFAULT: COMMON

show_center

Flag indicating whether to show the center points (see kind option). Default is True.

TYPE: bool DEFAULT: True

kind

The type of center to plot ('mean' or 'median'), by default 'mean'.

TYPE: Literal['mean', 'median'] DEFAULT: 'mean'

bars_same_color

Flag indicating whether to use same color for error bars as markers for center. If False, the error bars are black, by default False

TYPE: bool DEFAULT: False

skip_na

Flag indicating whether to skip missing values in the feature grouped data, by default None - None, no missing values are skipped - all', grouped data is skipped if all values are missing - any', grouped data is skipped if any value is missing

TYPE: Literal['none', 'all', 'any'] DEFAULT: None

target_on_y

Flag indicating whether the target variable is plotted on the y-axis, by default True.

TYPE: bool DEFAULT: True

color

Color to be used to draw the artists. If None, the first color is taken from the color cycle, by default None.

TYPE: str | None DEFAULT: None

marker

The marker style for the center points. Available markers see: https://matplotlib.org/stable/api/markers_api.html, by default None

TYPE: str | None DEFAULT: None

ax

The axes object for the plot. If None, the current axes is fetched using plt.gca(). If no axes are available, a new one is created. Defaults to None.

TYPE: Axes | None DEFAULT: None

visible_spines

Specifies which spines are visible, the others are hidden. If 'none', no spines are visible. If None, the spines are drawn according to the stylesheet. Defaults to None.

TYPE: Literal['target', 'feature', 'none'] | None DEFAULT: None

hide_axis

Specifies which axes should be hidden. If None, both axes are displayed. Defaults to None.

TYPE: Literal['target', 'feature', 'both'] | None DEFAULT: None

**kwds

Additional keyword arguments that have no effect and are only used to catch further arguments that have no use here (occurs when this class is used within chart objects).

DEFAULT: {}

Examples:

Apply to an existing Axes object:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from daspi import SpreadWidth, Beeswarm

fig, ax = plt.subplots()
df = pd.DataFrame(dict(
    x = ['first'] * 50 + ['second'] * 50 + ['third'] * 50,
    y = (
        list(np.random.normal(loc=3, scale=1, size=50))
        + list(np.random.normal(loc=4, scale=1, size=50))
        + list(np.random.normal(loc=2, scale=1, size=50)))))
swarm = Beeswarm(source=df, target='y', feature='x')
swarm(color=(0.3, )*4)
spread = SpreadWidth(
    source=df, target='y', feature='x', strategy='data', agreement=1.0, 
    kind='median', show_center=True, bars_same_color=True,
    ax=ax)
spread(kw_center=dict(s=30, marker='_'))
spread.label_feature_ticks()

Apply using the plot method of a DaSPi Chart object:

import numpy as np
import daspi as dsp
import pandas as pd

df = pd.DataFrame(dict(
    x = ['first'] * 50 + ['second'] * 50 + ['third'] * 50,
    y = (
        list(np.random.normal(loc=3, scale=1, size=50))
        + list(np.random.normal(loc=4, scale=1, size=50))
        + list(np.random.normal(loc=2, scale=1, size=50)))))
chart = dsp.SingleChart(
        source=df,
        target='y',
        feature='x',
        categorical_feature=True, # neded to label the feature tick labels
    ).plot(
        dsp.SpreadWidth,
        strategy='data',
        agreement=1.0,
        show_center=True,
        kind='median',
        bars_same_color=True,
        kw_call=dict(kw_center=dict(s=30, marker='_'))
    ).plot(
        dsp.Beeswarm,
        color=(0.3, ) * 4
    ).label() # neded to label the feature tick labels
Notes

Under the hood, the class daspi.statistics.estimation.Estimator is used. The error bar then corresponds to the control limits lcl and ucl` calculated with it.

If you want to display the minimum and maximum values (the range), set agreement to 1.0 (important: it must be a float) or to float('inf') and strategy to 'data'. This way, the control limits correspond to the minimum and maximum of the data.

estimation instance-attribute

Estimator instance used for spread width and center estimation.

strategy = strategy instance-attribute

Strategy for estimating the spread width.

agreement = agreement instance-attribute

Agreement value for the spread width estimation.

possible_dists = possible_dists instance-attribute

Tuple of possible distributions for the spread width estimation.

marker property

Get the marker style for the center points if show_center is True, otherwise '' is returned. By default the marker is '_' if target_on_y is True, '|' otherwise (read-only).

kind property writable

Get and set the type of location ('mean' or 'median') to plot.

RAISES DESCRIPTION
AssertionError

If neither 'mean' or 'median' is given when setting kind.

transform(feature_data, target_data)

Perform the transformation on the target data using the Estimator class and return the transformed data.

PARAMETER DESCRIPTION
feature_data

Base location (offset) of feature axis coming from feature_grouped generator.

TYPE: float | int

target_data

Feature grouped target data used for transformation, coming from feature_grouped generator.

TYPE: pandas Series

RETURNS DESCRIPTION
data

The transformed data source for the plot.

TYPE: pandas DataFrame