Loess
daspi.statistics.estimation.Loess(source, target, feature, *, fit_at_init=True, **kwds)
¶
Smooth the data using Locally Estimated Scatterplot Smoothing (LOESS).
LOESS is not necessarily suitable for all regression models due to its non-parametric approach and high computational intensity. Nonetheless, it serves as an effective method for modeling the relationship between two variables that do not adhere to a predefined distribution and exhibit a non-linear relationship.
| PARAMETER | DESCRIPTION |
|---|---|
source
|
Pandas long format DataFrame containing the data source for the plot.
TYPE:
|
target
|
Column name of the target variable for the plot.
TYPE:
|
feature
|
Column name of the feature variable for the plot, by default ''
TYPE:
|
fit_at_init
|
Whether to fit the model at initialization, by default True
TYPE:
|
**kwds
|
Keyword arguments for the
DEFAULT:
|
Examples:
import numpy as np
import daspi as dsp
import pandas as pd
import matplotlib.pyplot as plt
x = 5*np.random.random(100)
data = pd.DataFrame(dict(
x = x,
y = np.sin(x) * 3*np.exp(-x) + np.random.normal(0, 0.2, 100)))
model = dsp.Loess(data, 'y', 'x')
sequence, prediction, lower, upper = model.fitted_line(0.95)
fig, ax = plt.subplots()
ax.scatter(data.x, data.y)
ax.plot(sequence, prediction)
ax.fill_between(sequence, lower, upper, alpha=0.2)
Sources
[1] James Brennan (2020), Confidence intervals for LOWESS models in python https://james-brennan.github.io
[2] Soul Dobilas (2020), LOWESS Regression in Python: How to Discover Clear Patterns in Your Data?, Towards Data Science
[3] Btyner (2006), Local Regression Wikipedia
smoothed
instance-attribute
¶
Smoothed target data as a pandas Series.
std_errors
instance-attribute
¶
Standard errors of the smoothed target data as a pandas Series.
fraction
instance-attribute
¶
The fraction of data points used in each local regression.
kernel
instance-attribute
¶
The kernel for weights function used in the LOESS smoothing.
order
instance-attribute
¶
The order of local regression:
- 0: No smoothing (interpolation)
- 1: Linear regression
- 2: Quadratic regression
- 3: Cubic regression
The order determines the degree of the polynomial used in the local regression, affecting the flexibility of the fitted curve. High-degree polynomials would tend to overfit the data in each subset and are numerically unstable, making accurate computations difficult.
source = source
instance-attribute
¶
The data source for the plot
target = target
instance-attribute
¶
The column name of the target variable.
feature = feature
instance-attribute
¶
The column name of the feature variable.
x
property
¶
The feature variable as exogenous variable (read-only).
y
property
¶
The target variable as endogenous variable (read-only).
fitted
property
¶
True if the data has been fitted (read-only).
n_samples
property
¶
Amount of samples in source after removing missing values (read-only).
residuals
property
¶
Residuals as difference between target and lowess (read-only).
dof_resid
property
¶
Degree of freedom of residuals (read-only).
available_kernels
property
¶
Available kernels for smoothing (read-only).
bandwidth(fraction)
¶
Get the bandwidth parameter that determines the locality of the smoothing. It controls how far from the target point the weights will be considered. A larger bandwidth results in smoother fits, while a smaller bandwidth allows for more local variation to be captured in the fitted curve.
fit(fraction=0.6, order=3, kernel='tricube')
¶
Fits the model using the statsmodels.nonparametric.lowess
method.
| PARAMETER | DESCRIPTION |
|---|---|
fraction
|
The fraction of the data used for each local regression. A good value to start with is > 1/2 (default value of statsmodels is 2/3). Reduce the value to avoid underfitting. A value below 0.2 usually leads to overfitting exept for gaussian weights. Default is 0.6
TYPE:
|
order
|
The order of the local regression to be fitted. This determines the degree of the polynomial used in the local regression: - 0: No smoothing (interpolation) - 1: Linear regression - 2: Quadratic regression - 3: Cubic regression Default is 3.
TYPE:
|
kernel
|
The kernel function used to calculate the weights. Available kernels are: - 'tricube': Tricube kernel function - 'gaussian': Gaussian kernel function - 'epanechnikov': Epanechnikov kernel function Default is 'tricube'.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Lowess
|
The instance of the Lowess with the fitted values.
TYPE:
|
Notes
If using this method it's possible to run in a LinAlgError. This error usually happens in two main scenarios:
-
Singular Matrix: When the input data creates a singular matrix (determinant = 0). This often occurs when:
- You have perfectly correlated features
- You have duplicate data points
- There's not enough variation in your data
-
Ill-Conditioned Matrix: When the matrix is nearly singular. Common causes:
- Features with very different scales
- Multicollinearity between features
predict(x, kind='linear')
¶
Predict the target value(s) for the given feature value(s).
| PARAMETER | DESCRIPTION |
|---|---|
x
|
The feature value(s) for which to predict the target value(s).
TYPE:
|
kind
|
Specifies the kind of interpolation as a string or as an integer specifying the order of the spline interpolator to use. The string has to be one of 'linear', 'nearest', 'nearest-up', 'zero', 'slinear', 'quadratic', 'cubic', 'previous', or 'next'. - 'zero', 'slinear', 'quadratic' and 'cubic': refer to a spline interpolation of zeroth, first, second or third order. - 'previous' and 'next': simply return the previous or next value of the point - 'nearest-up' and 'nearest': differ when interpolating half-integers (e.g. 0.5, 1.5) in that 'nearest-up' rounds up and 'nearest' rounds down. Default is 'linear'.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
NDArray
|
The predicted target value(s) for the given feature value(s) as one-dimensonal numpy array. |
fitted_line(confidence_level=None, n_points=DEFAULT.LOWESS_SEQUENCE_LEN)
¶
Generate a smooth sequence of predictions from the fitted LOWESS model.
This method creates an evenly spaced sequence of feature values and predicts their corresponding target values using linear interpolation. It's particularly useful for: 1. Plotting smooth trend lines 2. Visualizing the LOWESS fit 3. Generating continuous predictions across the feature range
| PARAMETER | DESCRIPTION |
|---|---|
confidence_level
|
If provided, calculate confidence bands at this level (0 to 1). Example: 0.95 for 95% confidence bands. If None, no confidence bands are calculated. Default is None.
TYPE:
|
n_points
|
Number of points to generate in the sequence. More points create a smoother visualization but increase computation time. Default is defined in DEFAULT.LOWESS_SEQUENCE_LEN.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
sequence
|
Evenly spaced feature values
TYPE:
|
prediction
|
Predicted target values
TYPE:
|
lower_band
|
Lower confidence band. Only returned if confidence_level is provided
TYPE:
|
upper_band
|
Upper confidence band. Only returned if confidence_level is provided
TYPE:
|