Spread width
daspi.plotlib.plotter.SpreadWidth(source, target, feature='', strategy='norm', agreement=6, possible_dists=DIST.COMMON, show_center=True, kind='mean', bars_same_color=False, skip_na=None, target_on_y=True, color=None, marker=None, ax=None, visible_spines=None, hide_axis=None, **kwds)
¶
Bases: Errorbar
Class for creating plotters with error bars representing the spread width.
| PARAMETER | DESCRIPTION |
|---|---|
source
|
Pandas long format DataFrame containing the data source for the plot.
TYPE:
|
target
|
Column name of the target variable for the plot.
TYPE:
|
feature
|
Column name of the feature variable for the plot, by default ''.
TYPE:
|
strategy
|
Which strategy should be used to determine the control
limits (process spread):
- Default is 'norm'.
TYPE:
|
agreement
|
Specify the tolerated process variation for which the control limits are to be calculated. - If int, the spread is determined using the normal distribution agreementsigma, e.g. agreement = 6 -> 6sigma ~ covers 99.75 % of the data. The upper and lower permissible quantiles are then calculated from this. - If float, the value must be between 0 and 1.This value is then interpreted as the acceptable proportion for the spread, e.g. 0.9973 (which corresponds to ~ 6 sigma) Default is 6 because SixSigma ;-)
TYPE:
|
possible_dists
|
Distributions to which the data may be subject. Only
continuous distributions of scipy.stats are allowed,
by default
TYPE:
|
show_center
|
Flag indicating whether to show the center points (see
TYPE:
|
kind
|
The type of center to plot ('mean' or 'median'), by default 'mean'.
TYPE:
|
bars_same_color
|
Flag indicating whether to use same color for error bars as markers for center. If False, the error bars are black, by default False
TYPE:
|
skip_na
|
Flag indicating whether to skip missing values in the feature grouped data, by default None - None, no missing values are skipped - all', grouped data is skipped if all values are missing - any', grouped data is skipped if any value is missing
TYPE:
|
target_on_y
|
Flag indicating whether the target variable is plotted on the y-axis, by default True.
TYPE:
|
color
|
Color to be used to draw the artists. If None, the first color is taken from the color cycle, by default None.
TYPE:
|
marker
|
The marker style for the center points. Available markers see: https://matplotlib.org/stable/api/markers_api.html, by default None
TYPE:
|
ax
|
The axes object for the plot. If None, the current axes is
fetched using
TYPE:
|
visible_spines
|
Specifies which spines are visible, the others are hidden. If 'none', no spines are visible. If None, the spines are drawn according to the stylesheet. Defaults to None.
TYPE:
|
hide_axis
|
Specifies which axes should be hidden. If None, both axes are displayed. Defaults to None.
TYPE:
|
**kwds
|
Additional keyword arguments that have no effect and are only used to catch further arguments that have no use here (occurs when this class is used within chart objects).
DEFAULT:
|
Examples:
Apply to an existing Axes object:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from daspi import SpreadWidth, Beeswarm
fig, ax = plt.subplots()
df = pd.DataFrame(dict(
x = ['first'] * 50 + ['second'] * 50 + ['third'] * 50,
y = (
list(np.random.normal(loc=3, scale=1, size=50))
+ list(np.random.normal(loc=4, scale=1, size=50))
+ list(np.random.normal(loc=2, scale=1, size=50)))))
swarm = Beeswarm(source=df, target='y', feature='x')
swarm(color=(0.3, )*4)
spread = SpreadWidth(
source=df, target='y', feature='x', strategy='data', agreement=1.0,
kind='median', show_center=True, bars_same_color=True,
ax=ax)
spread(kw_center=dict(s=30, marker='_'))
spread.label_feature_ticks()
Apply using the plot method of a DaSPi Chart object:
import numpy as np
import daspi as dsp
import pandas as pd
df = pd.DataFrame(dict(
x = ['first'] * 50 + ['second'] * 50 + ['third'] * 50,
y = (
list(np.random.normal(loc=3, scale=1, size=50))
+ list(np.random.normal(loc=4, scale=1, size=50))
+ list(np.random.normal(loc=2, scale=1, size=50)))))
chart = dsp.SingleChart(
source=df,
target='y',
feature='x',
categorical_feature=True, # neded to label the feature tick labels
).plot(
dsp.SpreadWidth,
strategy='data',
agreement=1.0,
show_center=True,
kind='median',
bars_same_color=True,
kw_call=dict(kw_center=dict(s=30, marker='_'))
).plot(
dsp.Beeswarm,
color=(0.3, ) * 4
).label() # neded to label the feature tick labels
Notes
Under the hood, the class daspi.statistics.estimation.Estimator is
used. The error bar then corresponds to the control limits lcl and
ucl` calculated with it.
If you want to display the minimum and maximum values (the range),
set agreement to 1.0 (important: it must be a float) or to
float('inf') and strategy to 'data'. This way, the control limits
correspond to the minimum and maximum of the data.
estimation
instance-attribute
¶
Estimator instance used for spread width and center estimation.
strategy = strategy
instance-attribute
¶
Strategy for estimating the spread width.
agreement = agreement
instance-attribute
¶
Agreement value for the spread width estimation.
possible_dists = possible_dists
instance-attribute
¶
Tuple of possible distributions for the spread width estimation.
marker
property
¶
Get the marker style for the center points if show_center
is True, otherwise '' is returned. By default the marker is '_'
if target_on_y is True, '|' otherwise (read-only).
kind
property
writable
¶
Get and set the type of location ('mean' or 'median') to plot.
| RAISES | DESCRIPTION |
|---|---|
AssertionError
|
If neither 'mean' or 'median' is given when setting |
transform(feature_data, target_data)
¶
Perform the transformation on the target data using the
Estimator class and return the transformed data.
| PARAMETER | DESCRIPTION |
|---|---|
feature_data
|
Base location (offset) of feature axis coming from
TYPE:
|
target_data
|
Feature grouped target data used for transformation,
coming from
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
data
|
The transformed data source for the plot.
TYPE:
|