Skip to content

Pareto

daspi.plotlib.plotter.Pareto(source, target, feature, highlight=None, highlight_color=COLOR.BAD, highlighted_as_last=True, no_percentage_line=False, width=CATEGORY.FEATURE_SPACE, method=None, kw_method={}, skip_na=None, target_on_y=True, color=None, ax=None, visible_spines=None, hide_axis=None, **kwds)

Bases: Bar

A plotter to perform a pareto chart that extends the Bar plotter class.

A Pareto chart is a type of chart that combines a bar graph and a line graph. It is used to display and analyze data in order to prioritize and identify the most significant factors contributing to a particular phenomenon or problem. The line graph in a Pareto chart shows the cumulative percentage of the total, which helps identify the point at which a significant portion of the cumulative total is reached.

Pareto charts are commonly used in quality control, process improvement, and decision-making processes. They allow users to visually identify and focus on the most significant factors that contribute to a problem or outcome, enabling them to allocate resources and address the most critical issues first.

PARAMETER DESCRIPTION
source

Pandas long format DataFrame containing the data source for the plot.

TYPE: pandas DataFrame

target

Column name of the target variable for the plot.

TYPE: str

feature

Column name of the feature variable for the plot.

TYPE: str

highlight

The feature value whose bar should be highlighted in the diagram, by default None.

TYPE: Any DEFAULT: None

highlight_color

The color to use for highlighting, by default COLOR.BAD.

TYPE: str DEFAULT: BAD

highlighted_as_last

Whether the highlighted bar should be at the end, by default True.

TYPE: bool DEFAULT: True

no_percentage_line

Whether to draw a line as cumulative percentage, by default True

TYPE: bool DEFAULT: False

width

Width of the bars, by default CATEGORY.FEATURE_SPACE.

TYPE: float DEFAULT: FEATURE_SPACE

method

A pandas Series method to use for aggregating target values within each feature level. Like 'sum', 'count' or similar that returns a scalar, by default None.

TYPE: str DEFAULT: None

kw_method

Additional keyword arguments to be passed to the method, by default {}.

TYPE: dict DEFAULT: {}

skip_na

Flag indicating whether to skip missing values in the feature grouped data, by default None - None, no missing values are skipped - all', grouped data is skipped if all values are missing - any', grouped data is skipped if any value is missing

TYPE: Literal['none', 'all', 'any'] DEFAULT: None

target_on_y

Flag indicating whether the target variable is plotted on the y-axis, by default True.

TYPE: bool DEFAULT: True

color

Color to be used to draw the artists. If None, the first color is taken from the color cycle, by default None.

TYPE: str | None DEFAULT: None

ax

The axes object for the plot. If None, the current axes is fetched using plt.gca(). If no axes are available, a new one is created. Defaults to None.

TYPE: Axes | None DEFAULT: None

visible_spines

Specifies which spines are visible, the others are hidden. If 'none', no spines are visible. If None, the spines are drawn according to the stylesheet. Defaults to None.

TYPE: Literal['target', 'feature', 'none'] | None DEFAULT: None

hide_axis

Specifies which axes should be hidden. If None, both axes are displayed. Defaults to None.

TYPE: Literal['target', 'feature', 'both'] | None DEFAULT: None

**kwds

Those arguments have no effect. Only serves to catch further arguments that have no use here (occurs when this class is used within chart objects).

DEFAULT: {}

Examples:

Apply to an existing Axes object:

import pandas as pd
import matplotlib.pyplot as plt
from daspi import Pareto

fig, ax = plt.subplots()
df = pd.DataFrame(dict(
    x = list('abcdefghijklmno'),
    y = list(100/x for x in range(1, 16))))
pareto = Pareto(
    source=df, target='y', feature='x', ax=ax)
pareto()

You can also combine and highlight small frequencies:

import pandas as pd
import matplotlib.pyplot as plt
from daspi import Pareto

fig, ax = plt.subplots()
df = pd.DataFrame(dict(
    x = list('abcdefghijklmno'),
    y = list(100/x for x in range(1, 16))))
low_values = df.y <= 10
df2 = df[~low_values].copy()
df2.loc[len(df)-sum(low_values)] = ('rest', df[low_values].y.sum())
pareto = Pareto(
    source=df2, target='y', feature='x', highlight='rest', 
    highlight_color='#ff000090', highlighted_as_last=True)
pareto()

Apply using the plot method of a DaSPi Chart object:

import daspi as dsp
import pandas as pd

df = pd.DataFrame(dict(
    x = list('abcdefghijklmno'),
    y = list(100/x for x in range(1, 16))))
low_values = df.y <= 10
df2 = df[~low_values].copy()
df2.loc[len(df)-sum(low_values)] = ('rest', df[low_values].y.sum())
chart = dsp.SingleChart(
        source=df2,
        target='y',
        feature='x',
    ).plot(
        dsp.Pareto,
        highlight='rest',
        highlight_color=dsp.COLOR.BAD,
        no_percentage_line=False
    )
RAISES DESCRIPTION
AssertionError

If 'categorical_feature' is True, coming from Chart objects.

AssertionError

If an other Axes object in this Figure instance shares the feature axis.

no_percentage_line = no_percentage_line instance-attribute

Whether to draw the percentage line and the percentage text.

highlight = highlight instance-attribute

The feature value whose bar should be highlighted in the chart.

highlight_color = highlight_color instance-attribute

The color to use for highlighting.

highlighted_as_last = highlighted_as_last instance-attribute

Whether the highlighted bar should be at the end.

shared_feature_axes property

True if any other ax in this figure shares the feature axes.

indices property

Get arranged index values to access the target data (from source data) in the order to be plotted.

x property

Get the values used for the x-axis so that the target is displayed in descending order and the highlighted bar is at the end (if so specified).

y property

Get the values used for the y-axis so that the target is displayed in descending order and the highlighted bar is at the end (if so specified).

add_percentage_texts()

Add percentage texts on top of major grids