Quantile boxes

`daspi.plotlib.plotter.QuantileBoxes(source, target, feature='', strategy='data', agreements=DEFAULT.AGREEMENTS, possible_dists=DIST.COMMON, vary_width=True, width=CATEGORY.FEATURE_SPACE, skip_na=None, target_on_y=True, color=None, ax=None, visible_spines=None, hide_axis=None, **kwds)` ¶

Bases: SpreadOpacity, TransformPlotter

TransformPlotter for visualizing quantiles through box plots.

This class is designed to create box plots that represent various quantiles of the data based on specified ranges. The ranges are used to calculate the lower and upper quantiles, which define the boundaries of the boxes.

PARAMETER	DESCRIPTION
`source`	A long-format DataFrame containing the data source for the plot. TYPE: `pandas DataFrame`
`target`	The column name of the target variable to be plotted. TYPE: `str`
`feature`	The column name of the feature variable for the plot, by default an empty string. TYPE: `str` DEFAULT: `''`
`strategy`	Which strategy should be used to determine the quantiles: - `eval`: The strategy is determined according to the given evaluate function. If none is given, the internal `evaluate` method is used. - `fit`: First, the distribution that best represents the process data is searched for and then the agreed process spread is calculated - `norm`: it is assumed that the data is subject to normal distribution. The variation tolerance is then calculated as agreement * standard deviation - `data`: The quantiles for the process variation tolerance are read directly from the data. Default is 'data'. TYPE: `(eval, fit, norm, data)` DEFAULT: `'eval'`
`agreements`	Specifies the tolerated process variation for calculating quantiles. These quantiles are used to represent the filled area with different opacity, thus highlighting the quantiles. The agreements can be either integers or floats, determining the process variation tolerance in the following ways: - If integers, the quantiles are determined using the normal distribution (agreement * σ), e.g., agreement = 6 covers ~99.75% of the data. - If floats, values must be between 0 and 1, interpreted as acceptable proportions for the quantiles, e.g., 0.9973 corresponds to ~6σ. Default is `DEFAULT.AGREEMENTS` = (2, 4, 6), corresponding to (±1σ, ±2σ, ±3σ). TYPE: `Tuple[float, ...] or Tuple[int, ...]` DEFAULT: `AGREEMENTS`
`possible_dists`	Distributions to which the data may be subject. Only continuous distributions of scipy.stats are allowed, by default `DIST.COMMON` TYPE: `tuple of strings or rv_continous` DEFAULT: `COMMON`
`vary_width`	If True, the center box is the widest, while the outer boxes are progressively narrower, reflecting the distribution of the data. Defaults to True TYPE: `float` DEFAULT: `True`
`width`	The width of the boxes. If vary_width is set to True, the central box has this width, all others are narrower. Defaults to `CATEGORY.FEATURE_SPACE`. TYPE: `float` DEFAULT: `FEATURE_SPACE`
`skip_na`	A flag indicating how to handle missing values in the feature grouped data: - 'none': No missing values are skipped. - 'all': Grouped data is skipped if all values are missing. - 'any': Grouped data is skipped if any value is missing. TYPE: `Literal['none', 'all', 'any']` DEFAULT: `None`
`target_on_y`	A flag indicating whether the target variable is plotted on the y-axis, by default True. TYPE: `bool` DEFAULT: `True`
`color`	The color used to draw the box plots. If None, the first color from the color cycle is used, by default None. TYPE: `str \| None` DEFAULT: `None`
`ax`	The axes object for the plot. If None, the current axes is fetched using `plt.gca()`. If no axes are available, a new one is created. Defaults to None. TYPE: `Axes \| None` DEFAULT: `None`
`visible_spines`	Specifies which spines are visible, the others are hidden. If 'none', no spines are visible. If None, the spines are drawn according to the stylesheet. Defaults to None. TYPE: `Literal['target', 'feature', 'none'] \| None` DEFAULT: `None`
`hide_axis`	Specifies which axes should be hidden. If None, both axes are displayed. Defaults to None. TYPE: `Literal['target', 'feature', 'both'] \| None` DEFAULT: `None`
`**kwds`	Additional keyword arguments that are ignored in this context, primarily serving to capture any extra arguments when this class is used within chart objects. DEFAULT: `{}`

Examples:

Apply to an existing Axes object:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from daspi import QuantileBoxes

fig, ax = plt.subplots()
df = pd.DataFrame(dict(
    x = ['first'] * 50 + ['second'] * 50 + ['third'] * 50,
    y = (
        list(np.random.normal(loc=3, scale=1, size=50))
        + list(np.random.normal(loc=4, scale=1, size=50))
        + list(np.random.normal(loc=2, scale=1, size=50)))))
boxes = QuantileBoxes(
    source=df, target='y', feature='x', strategy='norm',
    agreements=(0.25, 0.5, 0.75, 0.95), vary_width=False, width=0.2,
    ax=ax)
boxes()
boxes.label_feature_ticks()

Apply using the plot method of a DaSPi Chart object:

import numpy as np
import daspi as dsp
import pandas as pd

df = pd.DataFrame(dict(
    x = ['first'] * 50 + ['second'] * 50 + ['third'] * 50,
    y = (
        list(np.random.normal(loc=3, scale=1, size=50))
        + list(np.random.normal(loc=4, scale=1, size=50))
        + list(np.random.normal(loc=2, scale=1, size=50)))))
chart = dsp.SingleChart(
        source=df,
        target='y',
        feature='x',
        categorical_feature=True # neded to label feature ticks
    ).plot(
        dsp.QuantileBoxes,
        strategy='norm',
        agreements=(0.25, 0.5, 0.75, 0.95),
        vary_width=False,
        width=0.2
    ).label() # neded to label feature ticks

`vary_width = vary_width` `instance-attribute` ¶

Flag that indicates whether the width of the boxes should vary, with the widest box in the middle and the narrower one towards the outside. If False, all have the same width.

`width = width / len(agreements) if vary_width else width` `instance-attribute` ¶

The maximum width of the center box in the plot.

`kw_default` `property` ¶

Default keyword arguments for plotting (read-only)

`width_values()` ¶

Returns the widths of the boxes in the plot.

`transform(feature_data, target_data)` ¶

Generates the spread values for the beeswarm plot by arranging the target data into bins.

The method divides the input data into bins based on the specified number of bins and calculates the spread of values within each bin to create a horizontal distribution.

PARAMETER	DESCRIPTION
`feature_data`	The center position on the feature axis where the beeswarm values will be centered. TYPE: `float`
`target_data`	feature grouped target data, coming from `feature_grouped' generator. TYPE: `pandas Series`

RETURNS	DESCRIPTION
`data`	The transformed data source for the plot. TYPE: `pandas DataFrame`

Quantile boxes

daspi.plotlib.plotter.QuantileBoxes(source, target, feature='', strategy='data', agreements=DEFAULT.AGREEMENTS, possible_dists=DIST.COMMON, vary_width=True, width=CATEGORY.FEATURE_SPACE, skip_na=None, target_on_y=True, color=None, ax=None, visible_spines=None, hide_axis=None, **kwds) ¶

vary_width = vary_width instance-attribute ¶

width = width / len(agreements) if vary_width else width instance-attribute ¶

kw_default property ¶

width_values() ¶

transform(feature_data, target_data) ¶