Gaussian kde

`daspi.plotlib.plotter.GaussianKDE(source, target, stretch=1, height=None, skip_na=None, ignore_feature=True, margin=0, fill=True, agreements=DEFAULT.AGREEMENTS, target_on_y=True, color=None, n_points=DEFAULT.KD_SEQUENCE_LEN, ax=None, visible_spines=None, hide_axis=None, **kwds)` ¶

Bases: SpreadOpacity, TransformPlotter

Class for creating Gaussian Kernel Density Estimation (KDE) plotters. Use this class for univariate plots.

Kernel density estimation is a way to estimate the probability density function (PDF) of a random variable in a non-parametric way. The used gaussian_kde function of scipy.stats works for both uni-variate and multi-variate data. It includes automatic bandwidth determination. The estimation works best for a unimodal distribution; bimodal or multi-modal distributions tend to be oversmoothed.

PARAMETER	DESCRIPTION
`source`	Pandas long format DataFrame containing the data source for the plot. TYPE: `pandas DataFrame`
`target`	Column name of the target variable for the plot. TYPE: `str`
`stretch`	Factor by which the curve was stretched in height, by default 1. TYPE: `float` DEFAULT: `1`
`height`	Height of the KDE curve at its maximum, by default None. TYPE: `float \| None` DEFAULT: `None`
`skip_na`	Flag indicating whether to skip missing values in the feature grouped data, by default None - None, no missing values are skipped - all', grouped data is skipped if all values are missing - any', grouped data is skipped if any value is missing TYPE: `Literal['none', 'all', 'any']` DEFAULT: `None`
`ignore_feature`	Flag indicating whether the feature axis should be ignored. If True, all curves have base 0 on the feature axis, by default True TYPE: `bool` DEFAULT: `True`
`margin`	Margin for the sequence as factor of data range (max - min ). If margin is 0, The two ends of the estimated density curve then show the minimum and maximum value. Default is 0. TYPE: `float` DEFAULT: `0`
`fill`	Flag whether to fill in the curves, by default True TYPE: `bool` DEFAULT: `True`
`agreements`	Specifies the tolerated process variation for calculating quantiles. These quantiles are used to represent the filled area with different opacity, thus highlighting the quantiles.If you want the filled area to be uniform without highlighting the quantiles, provide an empty tuple. This argument is only taken into account if fill is set to True. The agreements can be either integers or floats, determining the process variation tolerance in the following ways: - If integers, the quantiles are determined using the normal distribution (agreement * σ), e.g., agreement = 6 covers ~99.75% of the data. - If floats, values must be between 0 and 1, interpreted as acceptable proportions for the quantiles, e.g., 0.9973 corresponds to ~6σ. - If empty tuple, the filled area is uniform without highlighting the quantiles. Default is `DEFAULT.AGREEMENTS` = (2, 4, 6), corresponding to (±1σ, ±2σ, ±3σ). TYPE: `Tuple[float, ...] or Tuple[int, ...]` DEFAULT: `AGREEMENTS`
`target_on_y`	Flag indicating whether the target variable is plotted on the y-axis, by default True. TYPE: `bool` DEFAULT: `True`
`color`	Color to be used to draw the artists. If None, the first color is taken from the color cycle, by default None. TYPE: `str \| None` DEFAULT: `None`
`n_points`	Number of points the kernel density estimation and sequence should have, by default KD_SEQUENCE_LEN (defined in constants.py). TYPE: `int` DEFAULT: `KD_SEQUENCE_LEN`
`ax`	The axes object for the plot. If None, the current axes is fetched using `plt.gca()`. If no axes are available, a new one is created. Defaults to None. TYPE: `Axes \| None` DEFAULT: `None`
`visible_spines`	Specifies which spines are visible, the others are hidden. If 'none', no spines are visible. If None, the spines are drawn according to the stylesheet. Defaults to None. TYPE: `Literal['target', 'feature', 'none'] \| None` DEFAULT: `None`
`hide_axis`	Specifies which axes should be hidden. If None, both axes are displayed. Defaults to None. TYPE: `Literal['target', 'feature', 'both'] \| None` DEFAULT: `None`
`**kwds`	Additional keyword arguments that have no effect and are only used to catch further arguments that have no use here (occurs when this class is used within chart objects). DEFAULT: `{}`

Examples:

Apply to an existing Axes object:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from daspi import GaussianKDE

fig, ax = plt.subplots()
colors = ('#1f77b4', '#ff7f0e', '#2ca02c')
df = pd.DataFrame(dict(
    x = ['first'] * 50 + ['second'] * 50 + ['third'] * 50,
    y = (
        list(np.random.normal(loc=3, scale=1, size=50))
        + list(np.random.normal(loc=4, scale=1, size=50))
        + list(np.random.normal(loc=2, scale=1, size=50)))))
for color, (name, group) in zip(colors, df.groupby('x')):
    kde = GaussianKDE(
        source=group, target='y', strategy='norm', agreements=(2, 4, 6),
        target_on_y=False, color=color, margin=0.3, ax=ax)
    kde(label=name)
handles, labels = ax.get_legend_handles_labels()
agreements = kde.agreements[::-1] * df.x.nunique()
labels = [f'{l} {a}σ' for l, a in zip(labels, agreements)]
ax.legend(handles, labels)

Apply using the plot method of a DaSPi Chart object:

import numpy as np
import daspi as dsp
import pandas as pd

df = pd.DataFrame(dict(
    x = ['first'] * 50 + ['second'] * 50 + ['third'] * 50,
    y = (
        list(np.random.normal(loc=3, scale=1, size=50))
        + list(np.random.normal(loc=4, scale=1, size=50))
        + list(np.random.normal(loc=2, scale=1, size=50)))))
chart = dsp.SingleChart(
        source=df,
        target='y',
        hue='x',
        target_on_y=False,
    ).plot(
        dsp.GaussianKDE,
        strategy='norm',
        agreements=(2, 4, 6), # std for area color opacities
        visible_spines='target',
        hide_axis='feature',
    ).label() # neded to label feature ticks

`n_points = n_points` `instance-attribute` ¶

Number of points that have the kde and its sequence.

`margin = margin` `instance-attribute` ¶

Margin for the sequence as factor of data range (max - min ). If margin is 0, The two ends of the estimated density curve then show the minimum and maximum value.

`fill = fill` `instance-attribute` ¶

Flag whether to fill in the curves

`kw_default` `property` ¶

Default keyword arguments for plotting (read-only)

`height` `property` ¶

Height of kde curve at its maximum.

`stretch` `property` ¶

Factor by which the curve was stretched in height

`highlight_quantiles` `property` ¶

Flag indicating whether the quantiles should be highlighted.

`transform(feature_data, target_data)` ¶

Perform the transformation on the target data by estimating its kernel density. To obtain a uniform curve, a sequence is generated with a specific number of points in the same range (min to max) as the target data.

PARAMETER	DESCRIPTION
`feature_data`	Base location (offset) of feature axis coming from `feature_grouped` generator. TYPE: `float \| int`
`target_data`	Feature grouped target data used for transformation, coming from `feature_grouped` generator. TYPE: `pandas Series`

RETURNS	DESCRIPTION
`data`	The transformed data source for the plot. Contains the generated sequence as target data and the estimation as feature data. TYPE: `pandas DataFrame`

Gaussian kde

daspi.plotlib.plotter.GaussianKDE(source, target, stretch=1, height=None, skip_na=None, ignore_feature=True, margin=0, fill=True, agreements=DEFAULT.AGREEMENTS, target_on_y=True, color=None, n_points=DEFAULT.KD_SEQUENCE_LEN, ax=None, visible_spines=None, hide_axis=None, **kwds) ¶

n_points = n_points instance-attribute ¶

margin = margin instance-attribute ¶

fill = fill instance-attribute ¶

kw_default property ¶

height property ¶

stretch property ¶

highlight_quantiles property ¶

transform(feature_data, target_data) ¶