Skip to content

Loess

daspi.statistics.estimation.Loess(source, target, feature, *, fit_at_init=True, **kwds)

Smooth the data using Locally Estimated Scatterplot Smoothing (LOESS).

LOESS is not necessarily suitable for all regression models due to its non-parametric approach and high computational intensity. Nonetheless, it serves as an effective method for modeling the relationship between two variables that do not adhere to a predefined distribution and exhibit a non-linear relationship.

PARAMETER DESCRIPTION
source

Pandas long format DataFrame containing the data source for the plot.

TYPE: pandas DataFrame

target

Column name of the target variable for the plot.

TYPE: str

feature

Column name of the feature variable for the plot, by default ''

TYPE: str

fit_at_init

Whether to fit the model at initialization, by default True

TYPE: bool DEFAULT: True

**kwds

Keyword arguments for the fit method. Is only taken into account if fit_at_init is True.

DEFAULT: {}

Examples:

import numpy as np
import daspi as dsp
import pandas as pd
import matplotlib.pyplot as plt

x = 5*np.random.random(100)
data = pd.DataFrame(dict(
    x = x,
    y = np.sin(x) * 3*np.exp(-x) + np.random.normal(0, 0.2, 100)))
model = dsp.Loess(data, 'y', 'x')
sequence, prediction, lower, upper = model.fitted_line(0.95)

fig, ax = plt.subplots()
ax.scatter(data.x, data.y)
ax.plot(sequence, prediction)
ax.fill_between(sequence, lower, upper, alpha=0.2)
Sources

[1] James Brennan (2020), Confidence intervals for LOWESS models in python https://james-brennan.github.io

[2] Soul Dobilas (2020), LOWESS Regression in Python: How to Discover Clear Patterns in Your Data?, Towards Data Science

[3] Btyner (2006), Local Regression Wikipedia

smoothed instance-attribute

Smoothed target data as a pandas Series.

std_errors instance-attribute

Standard errors of the smoothed target data as a pandas Series.

fraction instance-attribute

The fraction of data points used in each local regression.

kernel instance-attribute

The kernel for weights function used in the LOESS smoothing.

order instance-attribute

The order of local regression:

  • 0: No smoothing (interpolation)
  • 1: Linear regression
  • 2: Quadratic regression
  • 3: Cubic regression

The order determines the degree of the polynomial used in the local regression, affecting the flexibility of the fitted curve. High-degree polynomials would tend to overfit the data in each subset and are numerically unstable, making accurate computations difficult.

source = source instance-attribute

The data source for the plot

target = target instance-attribute

The column name of the target variable.

feature = feature instance-attribute

The column name of the feature variable.

x property

The feature variable as exogenous variable (read-only).

y property

The target variable as endogenous variable (read-only).

fitted property

True if the data has been fitted (read-only).

n_samples property

Amount of samples in source after removing missing values (read-only).

residuals property

Residuals as difference between target and lowess (read-only).

dof_resid property

Degree of freedom of residuals (read-only).

available_kernels property

Available kernels for smoothing (read-only).

bandwidth(fraction)

Get the bandwidth parameter that determines the locality of the smoothing. It controls how far from the target point the weights will be considered. A larger bandwidth results in smoother fits, while a smaller bandwidth allows for more local variation to be captured in the fitted curve.

fit(fraction=0.6, order=3, kernel='tricube')

Fits the model using the statsmodels.nonparametric.lowess method.

PARAMETER DESCRIPTION
fraction

The fraction of the data used for each local regression. A good value to start with is > 1/2 (default value of statsmodels is 2/3). Reduce the value to avoid underfitting. A value below 0.2 usually leads to overfitting exept for gaussian weights. Default is 0.6

TYPE: float DEFAULT: 0.6

order

The order of the local regression to be fitted. This determines the degree of the polynomial used in the local regression: - 0: No smoothing (interpolation) - 1: Linear regression - 2: Quadratic regression - 3: Cubic regression Default is 3.

TYPE: Literal[0, 1, 2, 3] DEFAULT: 3

kernel

The kernel function used to calculate the weights. Available kernels are: - 'tricube': Tricube kernel function - 'gaussian': Gaussian kernel function - 'epanechnikov': Epanechnikov kernel function Default is 'tricube'.

TYPE: Literal['tricube', 'gaussian', 'epanechnikov'] DEFAULT: 'tricube'

RETURNS DESCRIPTION
Lowess

The instance of the Lowess with the fitted values.

TYPE: Self

Notes

If using this method it's possible to run in a LinAlgError. This error usually happens in two main scenarios:

  1. Singular Matrix: When the input data creates a singular matrix (determinant = 0). This often occurs when:

    • You have perfectly correlated features
    • You have duplicate data points
    • There's not enough variation in your data
  2. Ill-Conditioned Matrix: When the matrix is nearly singular. Common causes:

    • Features with very different scales
    • Multicollinearity between features

predict(x, kind='linear')

Predict the target value(s) for the given feature value(s).

PARAMETER DESCRIPTION
x

The feature value(s) for which to predict the target value(s).

TYPE: int | float | NumericSample1D

kind

Specifies the kind of interpolation as a string or as an integer specifying the order of the spline interpolator to use. The string has to be one of 'linear', 'nearest', 'nearest-up', 'zero', 'slinear', 'quadratic', 'cubic', 'previous', or 'next'. - 'zero', 'slinear', 'quadratic' and 'cubic': refer to a spline interpolation of zeroth, first, second or third order. - 'previous' and 'next': simply return the previous or next value of the point - 'nearest-up' and 'nearest': differ when interpolating half-integers (e.g. 0.5, 1.5) in that 'nearest-up' rounds up and 'nearest' rounds down. Default is 'linear'.

TYPE: str or int DEFAULT: 'linear'

RETURNS DESCRIPTION
NDArray

The predicted target value(s) for the given feature value(s) as one-dimensonal numpy array.

fitted_line(confidence_level=None, n_points=DEFAULT.LOWESS_SEQUENCE_LEN)

fitted_line() -> Tuple[NDArray, NDArray]
fitted_line(confidence_level: None, n_points: int = ...) -> Tuple[NDArray, NDArray]
fitted_line(confidence_level: float, n_points: int = ...) -> Tuple[NDArray, NDArray, NDArray, NDArray]

Generate a smooth sequence of predictions from the fitted LOWESS model.

This method creates an evenly spaced sequence of feature values and predicts their corresponding target values using linear interpolation. It's particularly useful for: 1. Plotting smooth trend lines 2. Visualizing the LOWESS fit 3. Generating continuous predictions across the feature range

PARAMETER DESCRIPTION
confidence_level

If provided, calculate confidence bands at this level (0 to 1). Example: 0.95 for 95% confidence bands. If None, no confidence bands are calculated. Default is None.

TYPE: float | None DEFAULT: None

n_points

Number of points to generate in the sequence. More points create a smoother visualization but increase computation time. Default is defined in DEFAULT.LOWESS_SEQUENCE_LEN.

TYPE: int DEFAULT: LOWESS_SEQUENCE_LEN

RETURNS DESCRIPTION
sequence

Evenly spaced feature values

TYPE: NDArray

prediction

Predicted target values

TYPE: NDArray

lower_band

Lower confidence band. Only returned if confidence_level is provided

TYPE: (NDArray, optional)

upper_band

Upper confidence band. Only returned if confidence_level is provided

TYPE: (NDArray, optional)