Loess

`daspi.statistics.estimation.Loess(source, target, feature, *, fit_at_init=True, **kwds)` ¶

Smooth the data using Locally Estimated Scatterplot Smoothing (LOESS).

LOESS is not necessarily suitable for all regression models due to its non-parametric approach and high computational intensity. Nonetheless, it serves as an effective method for modeling the relationship between two variables that do not adhere to a predefined distribution and exhibit a non-linear relationship.

PARAMETER	DESCRIPTION
`source`	Pandas long format DataFrame containing the data source for the plot. TYPE: `pandas DataFrame`
`target`	Column name of the target variable for the plot. TYPE: `str`
`feature`	Column name of the feature variable for the plot, by default '' TYPE: `str`
`fit_at_init`	Whether to fit the model at initialization, by default True TYPE: `bool` DEFAULT: `True`
`**kwds`	Keyword arguments for the `fit` method. Is only taken into account if `fit_at_init` is True. DEFAULT: `{}`

Examples:

import numpy as np
import daspi as dsp
import pandas as pd
import matplotlib.pyplot as plt

x = 5*np.random.random(100)
data = pd.DataFrame(dict(
    x = x,
    y = np.sin(x) * 3*np.exp(-x) + np.random.normal(0, 0.2, 100)))
model = dsp.Loess(data, 'y', 'x')
sequence, prediction, lower, upper = model.fitted_line(0.95)

fig, ax = plt.subplots()
ax.scatter(data.x, data.y)
ax.plot(sequence, prediction)
ax.fill_between(sequence, lower, upper, alpha=0.2)

Sources

[1] James Brennan (2020), Confidence intervals for LOWESS models in python https://james-brennan.github.io

[2] Soul Dobilas (2020), LOWESS Regression in Python: How to Discover Clear Patterns in Your Data?, Towards Data Science

[3] Btyner (2006), Local Regression Wikipedia

`smoothed` `instance-attribute` ¶

Smoothed target data as a pandas Series.

`std_errors` `instance-attribute` ¶

Standard errors of the smoothed target data as a pandas Series.

`fraction` `instance-attribute` ¶

The fraction of data points used in each local regression.

`kernel` `instance-attribute` ¶

The kernel for weights function used in the LOESS smoothing.

`order` `instance-attribute` ¶

The order of local regression:

0: No smoothing (interpolation)
1: Linear regression
2: Quadratic regression
3: Cubic regression

The order determines the degree of the polynomial used in the local regression, affecting the flexibility of the fitted curve. High-degree polynomials would tend to overfit the data in each subset and are numerically unstable, making accurate computations difficult.

`source = source` `instance-attribute` ¶

The data source for the plot

`target = target` `instance-attribute` ¶

The column name of the target variable.

`feature = feature` `instance-attribute` ¶

The column name of the feature variable.

`x` `property` ¶

The feature variable as exogenous variable (read-only).

`y` `property` ¶

The target variable as endogenous variable (read-only).

`fitted` `property` ¶

True if the data has been fitted (read-only).

`n_samples` `property` ¶

Amount of samples in source after removing missing values (read-only).

`residuals` `property` ¶

Residuals as difference between target and lowess (read-only).

`dof_resid` `property` ¶

Degree of freedom of residuals (read-only).

`available_kernels` `property` ¶

Available kernels for smoothing (read-only).

`bandwidth(fraction)` ¶

Get the bandwidth parameter that determines the locality of the smoothing. It controls how far from the target point the weights will be considered. A larger bandwidth results in smoother fits, while a smaller bandwidth allows for more local variation to be captured in the fitted curve.

`fit(fraction=0.6, order=3, kernel='tricube')` ¶

Fits the model using the statsmodels.nonparametric.lowess method.

PARAMETER	DESCRIPTION
`fraction`	The fraction of the data used for each local regression. A good value to start with is > 1/2 (default value of statsmodels is 2/3). Reduce the value to avoid underfitting. A value below 0.2 usually leads to overfitting exept for gaussian weights. Default is 0.6 TYPE: `float` DEFAULT: `0.6`
`order`	The order of the local regression to be fitted. This determines the degree of the polynomial used in the local regression: - 0: No smoothing (interpolation) - 1: Linear regression - 2: Quadratic regression - 3: Cubic regression Default is 3. TYPE: `Literal[0, 1, 2, 3]` DEFAULT: `3`
`kernel`	The kernel function used to calculate the weights. Available kernels are: - 'tricube': Tricube kernel function - 'gaussian': Gaussian kernel function - 'epanechnikov': Epanechnikov kernel function Default is 'tricube'. TYPE: `Literal['tricube', 'gaussian', 'epanechnikov']` DEFAULT: `'tricube'`

RETURNS	DESCRIPTION
`Lowess`	The instance of the Lowess with the fitted values. TYPE: `Self`

Notes

If using this method it's possible to run in a LinAlgError. This error usually happens in two main scenarios:

Singular Matrix: When the input data creates a singular matrix (determinant = 0). This often occurs when:
- You have perfectly correlated features
- You have duplicate data points
- There's not enough variation in your data
Ill-Conditioned Matrix: When the matrix is nearly singular. Common causes:
- Features with very different scales
- Multicollinearity between features

`predict(x, kind='linear')` ¶

Predict the target value(s) for the given feature value(s).

PARAMETER DESCRIPTION

x

The feature value(s) for which to predict the target value(s).

TYPE: int | float | NumericSample1D

kind

Specifies the kind of interpolation as a string or as an integer specifying the order of the spline interpolator to use. The string has to be one of 'linear', 'nearest', 'nearest-up', 'zero', 'slinear', 'quadratic', 'cubic', 'previous', or 'next'. - 'zero', 'slinear', 'quadratic' and 'cubic': refer to a spline interpolation of zeroth, first, second or third order. - 'previous' and 'next': simply return the previous or next value of the point - 'nearest-up' and 'nearest': differ when interpolating half-integers (e.g. 0.5, 1.5) in that 'nearest-up' rounds up and 'nearest' rounds down. Default is 'linear'.

TYPE: str or int DEFAULT: 'linear'

RETURNS	DESCRIPTION
`NDArray`	The predicted target value(s) for the given feature value(s) as one-dimensonal numpy array.

`fitted_line(confidence_level=None, n_points=DEFAULT.LOWESS_SEQUENCE_LEN)` ¶

fitted_line() -> Tuple[NDArray, NDArray]

fitted_line(confidence_level: None, n_points: int = ...) -> Tuple[NDArray, NDArray]

fitted_line(confidence_level: float, n_points: int = ...) -> Tuple[NDArray, NDArray, NDArray, NDArray]

Generate a smooth sequence of predictions from the fitted LOWESS model.

This method creates an evenly spaced sequence of feature values and predicts their corresponding target values using linear interpolation. It's particularly useful for: 1. Plotting smooth trend lines 2. Visualizing the LOWESS fit 3. Generating continuous predictions across the feature range

PARAMETER	DESCRIPTION
`confidence_level`	If provided, calculate confidence bands at this level (0 to 1). Example: 0.95 for 95% confidence bands. If None, no confidence bands are calculated. Default is None. TYPE: `float \| None` DEFAULT: `None`
`n_points`	Number of points to generate in the sequence. More points create a smoother visualization but increase computation time. Default is defined in DEFAULT.LOWESS_SEQUENCE_LEN. TYPE: `int` DEFAULT: `LOWESS_SEQUENCE_LEN`

RETURNS	DESCRIPTION
`sequence`	Evenly spaced feature values TYPE: `NDArray`
`prediction`	Predicted target values TYPE: `NDArray`
`lower_band`	Lower confidence band. Only returned if confidence_level is provided TYPE: `(NDArray, optional)`
`upper_band`	Upper confidence band. Only returned if confidence_level is provided TYPE: `(NDArray, optional)`

Loess

daspi.statistics.estimation.Loess(source, target, feature, *, fit_at_init=True, **kwds) ¶

smoothed instance-attribute ¶

std_errors instance-attribute ¶

fraction instance-attribute ¶

kernel instance-attribute ¶

order instance-attribute ¶

source = source instance-attribute ¶

target = target instance-attribute ¶

feature = feature instance-attribute ¶

x property ¶

y property ¶

fitted property ¶

n_samples property ¶

residuals property ¶

dof_resid property ¶

available_kernels property ¶

bandwidth(fraction) ¶

fit(fraction=0.6, order=3, kernel='tricube') ¶

predict(x, kind='linear') ¶

fitted_line(confidence_level=None, n_points=DEFAULT.LOWESS_SEQUENCE_LEN) ¶