pybaselines.Baseline.loess
- Baseline.loess(data, fraction=0.2, total_points=None, poly_order=1, scale=3.0, tol=0.001, max_iter=10, symmetric_weights=False, use_threshold=False, num_std=1, use_original=False, weights=None, return_coef=False, conserve_memory=True, delta=None)[source]
Locally estimated scatterplot smoothing (LOESS).
Performs polynomial regression at each data point using the nearest points.
- Parameters:
- dataarray_like, shape (N,)
The y-values of the measured data, with N data points.
- fraction
float
, optional The fraction of N data points to include for the fitting on each point. Default is 0.2. Not used if total_points is not None.
- total_points
int
, optional The total number of points to include for the fitting on each point. Default is None, which will use fraction * N to determine the number of points.
- scale
float
, optional A scale factor applied to the weighted residuals to control the robustness of the fit. Default is 3.0, as used in [1]. Note that the original loess procedure for smoothing in [2] used a scale of ~4.05.
- poly_order
int
, optional The polynomial order for fitting the baseline. Default is 1.
- tol
float
, optional The exit criteria. Default is 1e-3.
- max_iter
int
, optional The maximum number of iterations. Default is 10.
- symmetric_weightsbool, optional
If False (default), will apply weighting asymmetrically, with residuals < 0 having a weight of 1, according to [1]. If True, will apply weighting the same for both positive and negative residuals, which is regular LOESS. If use_threshold is True, this parameter is ignored.
- use_thresholdbool, optional
If False (default), will compute weights each iteration to perform the robust fitting, which is regular LOESS. If True, will apply a threshold on the data being fit each iteration, based on the maximum values of the data and the fit baseline, as proposed by [3], similar to the modpoly and imodpoly techniques.
- num_std
float
, optional The number of standard deviations to include when thresholding. Default is 1, which is the value used for the imodpoly technique. Only used if use_threshold is True.
- use_originalbool, optional
If False (default), will compare the baseline of each iteration with the y-values of that iteration [4] when choosing minimum values for thresholding. If True, will compare the baseline with the original y-values given by data [5]. Only used if use_threshold is True.
- weightsarray_like, shape (N,), optional
The weighting array. If None (default), then will be an array with size equal to N and all values set to 1.
- return_coefbool, optional
If True, will convert the polynomial coefficients for the fit baseline to a form that fits the input x_data and return them in the params dictionary. Default is False, since the conversion takes time.
- conserve_memorybool, optional
If False, will cache the distance-weighted kernels for each value in x_data on the first iteration and reuse them on subsequent iterations to save time. The shape of the array of kernels is (len(x_data), total_points). If True (default), will recalculate the kernels each iteration, which uses very little memory, but is slower. Can usually set to False unless x_data and`total_points` are quite large and the function causes memory issues when cacheing the kernels. If numba is installed, there is no significant time difference since the calculations are sped up.
- delta
float
, optional If delta is > 0, will skip all but the last x-value in the range x_last + delta, where x_last is the last x-value to be fit using weighted least squares, and instead use linear interpolation to calculate the fit for those x-values, which can significantly reduce the calculation time (same behavior as in statsmodels [6] and Cleveland's original Fortran lowess implementation [7]). Fits all x-values if delta is <= 0. Default is None, which sets delta to 0.01 * (max(x_data) - min(x_data)).
- Returns:
- baseline
numpy.ndarray
, shape (N,) The calculated baseline.
- params
dict
A dictionary with the following items:
- 'weights': numpy.ndarray, shape (N,)
The weight array used for fitting the data. Does NOT contain the individual distance-weighted kernels for each x-value.
- 'tol_history': numpy.ndarray
An array containing the calculated tolerance values for each iteration. The length of the array is the number of iterations completed. If the last value in the array is greater than the input tol value, then the function did not converge.
- 'coef': numpy.ndarray, shape (N, poly_order + 1)
Only if return_coef is True. The array of polynomial parameters for the baseline, in increasing order. Can be used to create a polynomial using
numpy.polynomial.polynomial.Polynomial
. If delta is > 0, the coefficients for any skipped x-value will all be 0.
- baseline
- Raises:
ValueError
Raised if the number of points per window for the fitting is less than poly_order + 1 or greater than the total number of points, or if the values in self.x are not strictly increasing.
Notes
The iterative, robust, aspect of the fitting can be achieved either through reweighting based on the residuals (the typical usage), or thresholding the fit data based on the residuals, as proposed by [3], similar to the modpoly and imodpoly techniques.
In baseline literature, this procedure is sometimes called "rbe", meaning "robust baseline estimate".
References
[1] (1,2)Ruckstuhl, A.F., et al. Baseline subtraction using robust local regression estimation. J. Quantitative Spectroscopy and Radiative Transfer, 2001, 68, 179-193.
[2]Cleveland, W. Robust locally weighted regression and smoothing scatterplots. Journal of the American Statistical Association, 1979, 74(368), 829-836.
[3] (1,2)Komsta, Ł. Comparison of Several Methods of Chromatographic Baseline Removal with a New Approach Based on Quantile Regression. Chromatographia, 2011, 73, 721-731.
[4]Gan, F., et al. Baseline correction by improved iterative polynomial fitting with automatic threshold. Chemometrics and Intelligent Laboratory Systems, 2006, 82, 59-65.
[5]Lieber, C., et al. Automated method for subtraction of fluorescence from biological raman spectra. Applied Spectroscopy, 2003, 57(11), 1363-1367.
[7]https://www.netlib.org/go (lowess.f is the file).