pybaselines.whittaker
Module Contents
Functions
Adaptive iteratively reweighted penalized least squares (airPLS) baseline. |
|
Asymmetrically reweighted penalized least squares smoothing (arPLS). |
|
Fits the baseline using asymmetric least squares (AsLS) fitting. |
|
Adaptive smoothness penalized least squares smoothing (asPLS). |
|
Derivative Peak-Screening Asymmetric Least Squares Algorithm (derpsalsa). |
|
Doubly reweighted penalized least squares (drPLS) baseline. |
|
Improved asymmetrically reweighted penalized least squares smoothing (IarPLS). |
|
Fits the baseline using the improved asymmetric least squares (IAsLS) algorithm. |
|
Peaked Signal's Asymmetric Least Squares Algorithm (psalsa). |
- pybaselines.whittaker.airpls(data, lam=1000000.0, diff_order=2, max_iter=50, tol=0.001, weights=None, x_data=None)[source]
Adaptive iteratively reweighted penalized least squares (airPLS) baseline.
- Parameters:
- dataarray-like
The y-values of the measured data, with N data points. Must not contain missing data (NaN) or Inf.
- lamfloat, optional
The smoothing parameter. Larger values will create smoother baselines. Default is 1e6.
- diff_orderint, optional
The order of the differential matrix. Must be greater than 0. Default is 2 (second order differential matrix). Typical values are 2 or 1.
- max_iterint, optional
The max number of fit iterations. Default is 50.
- tolfloat, optional
The exit criteria. Default is 1e-3.
- weightsarray-like, shape (N,), optional
The weighting array. If None (default), then the initial weights will be an array with size equal to N and all values set to 1.
- x_dataarray-like, optional
The x-values. Not used by this function, but input is allowed for consistency with other functions.
- Returns:
- baselinenumpy.ndarray, shape (N,)
The calculated baseline.
- paramsdict
A dictionary with the following items:
- 'weights': numpy.ndarray, shape (N,)
The weight array used for fitting the data.
- 'tol_history': numpy.ndarray
An array containing the calculated tolerance values for each iteration. The length of the array is the number of iterations completed. If the last value in the array is greater than the input tol value, then the function did not converge.
References
Zhang, Z.M., et al. Baseline correction using adaptive iteratively reweighted penalized least squares. Analyst, 2010, 135(5), 1138-1146.
- pybaselines.whittaker.arpls(data, lam=100000.0, diff_order=2, max_iter=50, tol=0.001, weights=None, x_data=None)[source]
Asymmetrically reweighted penalized least squares smoothing (arPLS).
- Parameters:
- dataarray-like, shape (N,)
The y-values of the measured data, with N data points. Must not contain missing data (NaN) or Inf.
- lamfloat, optional
The smoothing parameter. Larger values will create smoother baselines. Default is 1e5.
- diff_orderint, optional
The order of the differential matrix. Must be greater than 0. Default is 2 (second order differential matrix). Typical values are 2 or 1.
- max_iterint, optional
The max number of fit iterations. Default is 50.
- tolfloat, optional
The exit criteria. Default is 1e-3.
- weightsarray-like, shape (N,), optional
The weighting array. If None (default), then the initial weights will be an array with size equal to N and all values set to 1.
- x_dataarray-like, optional
The x-values. Not used by this function, but input is allowed for consistency with other functions.
- Returns:
- baselinenumpy.ndarray, shape (N,)
The calculated baseline.
- paramsdict
A dictionary with the following items:
- 'weights': numpy.ndarray, shape (N,)
The weight array used for fitting the data.
- 'tol_history': numpy.ndarray
An array containing the calculated tolerance values for each iteration. The length of the array is the number of iterations completed. If the last value in the array is greater than the input tol value, then the function did not converge.
References
Baek, S.J., et al. Baseline correction using asymmetrically reweighted penalized least squares smoothing. Analyst, 2015, 140, 250-257.
- pybaselines.whittaker.asls(data, lam=1000000.0, p=0.01, diff_order=2, max_iter=50, tol=0.001, weights=None, x_data=None)[source]
Fits the baseline using asymmetric least squares (AsLS) fitting.
- Parameters:
- dataarray-like, shape (N,)
The y-values of the measured data, with N data points. Must not contain missing data (NaN) or Inf.
- lamfloat, optional
The smoothing parameter. Larger values will create smoother baselines. Default is 1e6.
- pfloat, optional
The penalizing weighting factor. Must be between 0 and 1. Values greater than the baseline will be given p weight, and values less than the baseline will be given p - 1 weight. Default is 1e-2.
- diff_orderint, optional
The order of the differential matrix. Must be greater than 0. Default is 2 (second order differential matrix). Typical values are 2 or 1.
- max_iterint, optional
The max number of fit iterations. Default is 50.
- tolfloat, optional
The exit criteria. Default is 1e-3.
- weightsarray-like, shape (N,), optional
The weighting array. If None (default), then the initial weights will be an array with size equal to N and all values set to 1.
- x_dataarray-like, optional
The x-values. Not used by this function, but input is allowed for consistency with other functions.
- Returns:
- baselinenumpy.ndarray, shape (N,)
The calculated baseline.
- paramsdict
A dictionary with the following items:
- 'weights': numpy.ndarray, shape (N,)
The weight array used for fitting the data.
- 'tol_history': numpy.ndarray
An array containing the calculated tolerance values for each iteration. The length of the array is the number of iterations completed. If the last value in the array is greater than the input tol value, then the function did not converge.
- Raises:
- ValueError
Raised if p is not between 0 and 1.
References
Eilers, P. A Perfect Smoother. Analytical Chemistry, 2003, 75(14), 3631-3636.
Eilers, P., et al. Baseline correction with asymmetric least squares smoothing. Leiden University Medical Centre Report, 2005, 1(1).
- pybaselines.whittaker.aspls(data, lam=100000.0, diff_order=2, max_iter=100, tol=0.001, weights=None, alpha=None, x_data=None)[source]
Adaptive smoothness penalized least squares smoothing (asPLS).
- Parameters:
- dataarray-like, shape (N,)
The y-values of the measured data, with N data points. Must not contain missing data (NaN) or Inf.
- lamfloat, optional
The smoothing parameter. Larger values will create smoother baselines. Default is 1e5.
- diff_orderint, optional
The order of the differential matrix. Must be greater than 0. Default is 2 (second order differential matrix). Typical values are 2 or 1.
- max_iterint, optional
The max number of fit iterations. Default is 100.
- tolfloat, optional
The exit criteria. Default is 1e-3.
- weightsarray-like, shape (N,), optional
The weighting array. If None (default), then the initial weights will be an array with size equal to N and all values set to 1.
- alphaarray-like, shape (N,), optional
An array of values that control the local value of lam to better fit peak and non-peak regions. If None (default), then the initial values will be an array with size equal to N and all values set to 1.
- x_dataarray-like, optional
The x-values. Not used by this function, but input is allowed for consistency with other functions.
- Returns:
- baselinenumpy.ndarray, shape (N,)
The calculated baseline.
- paramsdict
A dictionary with the following items:
- 'weights': numpy.ndarray, shape (N,)
The weight array used for fitting the data.
- 'alpha': numpy.ndarray, shape (N,)
The array of alpha values used for fitting the data in the final iteration.
- 'tol_history': numpy.ndarray
An array containing the calculated tolerance values for each iteration. The length of the array is the number of iterations completed. If the last value in the array is greater than the input tol value, then the function did not converge.
- Raises:
- ValueError
Raised if alpha and data do not have the same shape.
Notes
The weighting uses an asymmetric coefficient (k in the asPLS paper) of 0.5 instead of the 2 listed in the asPLS paper. pybaselines uses the factor of 0.5 since it matches the results in Table 2 and Figure 5 of the asPLS paper closer than the factor of 2 and fits noisy data much better.
References
Zhang, F., et al. Baseline correction for infrared spectra using adaptive smoothness parameter penalized least squares method. Spectroscopy Letters, 2020, 53(3), 222-233.
- pybaselines.whittaker.derpsalsa(data, lam=1000000.0, p=0.01, k=None, diff_order=2, max_iter=50, tol=0.001, weights=None, smooth_half_window=None, num_smooths=16, x_data=None, **pad_kwargs)[source]
Derivative Peak-Screening Asymmetric Least Squares Algorithm (derpsalsa).
- Parameters:
- dataarray-like, shape (N,)
The y-values of the measured data, with N data points. Must not contain missing data (NaN) or Inf.
- lamfloat, optional
The smoothing parameter. Larger values will create smoother baselines. Default is 1e6.
- pfloat, optional
The penalizing weighting factor. Must be between 0 and 1. Values greater than the baseline will be given p weight, and values less than the baseline will be given p - 1 weight. Default is 1e-2.
- kfloat, optional
A factor that controls the exponential decay of the weights for baseline values greater than the data. Should be approximately the height at which a value could be considered a peak. Default is None, which sets k to one-tenth of the standard deviation of the input data. A large k value will produce similar results to
asls()
.- diff_orderint, optional
The order of the differential matrix. Must be greater than 0. Default is 2 (second order differential matrix). Typical values are 2 or 1.
- max_iterint, optional
The max number of fit iterations. Default is 50.
- tolfloat, optional
The exit criteria. Default is 1e-3.
- weightsarray-like, shape (N,), optional
The weighting array. If None (default), then the initial weights will be an array with size equal to N and all values set to 1.
- smooth_half_windowint, optional
The half-window to use for smoothing the data before computing the first and second derivatives. Default is None, which will use
len(data) / 200
.- num_smoothsint, optional
The number of times to smooth the data before computing the first and second derivatives. Default is 16.
- x_dataarray-like, optional
The x-values. Not used by this function, but input is allowed for consistency with other functions.
- **pad_kwargs
Additional keyword arguments to pass to
pad_edges()
for padding the edges of the data to prevent edge effects from smoothing.
- Returns:
- baselinenumpy.ndarray, shape (N,)
The calculated baseline.
- paramsdict
A dictionary with the following items:
- 'weights': numpy.ndarray, shape (N,)
The weight array used for fitting the data.
- 'tol_history': numpy.ndarray
An array containing the calculated tolerance values for each iteration. The length of the array is the number of iterations completed. If the last value in the array is greater than the input tol value, then the function did not converge.
- Raises:
- ValueError
Raised if p is not between 0 and 1.
References
Korepanov, V. Asymmetric least-squares baseline algorithm with peak screening for automatic processing of the Raman spectra. Journal of Raman Spectroscopy. 2020, 51(10), 2061-2065.
- pybaselines.whittaker.drpls(data, lam=100000.0, eta=0.5, max_iter=50, tol=0.001, weights=None, diff_order=2, x_data=None)[source]
Doubly reweighted penalized least squares (drPLS) baseline.
- Parameters:
- dataarray-like, shape (N,)
The y-values of the measured data, with N data points. Must not contain missing data (NaN) or Inf.
- lamfloat, optional
The smoothing parameter. Larger values will create smoother baselines. Default is 1e5.
- etafloat
A term for controlling the value of lam; should be between 0 and 1. Low values will produce smoother baselines, while higher values will more aggressively fit peaks. Default is 0.5.
- max_iterint, optional
The max number of fit iterations. Default is 50.
- tolfloat, optional
The exit criteria. Default is 1e-3.
- weightsarray-like, shape (N,), optional
The weighting array. If None (default), then the initial weights will be an array with size equal to N and all values set to 1.
- diff_orderint, optional
The order of the differential matrix. Must be greater than 1. Default is 2 (second order differential matrix). Typical values are 2 or 3.
- x_dataarray-like, optional
The x-values. Not used by this function, but input is allowed for consistency with other functions.
- Returns:
- baselinenumpy.ndarray, shape (N,)
The calculated baseline.
- paramsdict
A dictionary with the following items:
- 'weights': numpy.ndarray, shape (N,)
The weight array used for fitting the data.
- 'tol_history': numpy.ndarray
An array containing the calculated tolerance values for each iteration. The length of the array is the number of iterations completed. If the last value in the array is greater than the input tol value, then the function did not converge.
- Raises:
- ValueError
Raised if eta is not between 0 and 1 or if diff_order is less than 2.
References
Xu, D. et al. Baseline correction method based on doubly reweighted penalized least squares, Applied Optics, 2019, 58, 3913-3920.
- pybaselines.whittaker.iarpls(data, lam=100000.0, diff_order=2, max_iter=50, tol=0.001, weights=None, x_data=None)[source]
Improved asymmetrically reweighted penalized least squares smoothing (IarPLS).
- Parameters:
- dataarray-like, shape (N,)
The y-values of the measured data, with N data points. Must not contain missing data (NaN) or Inf.
- lamfloat, optional
The smoothing parameter. Larger values will create smoother baselines. Default is 1e5.
- diff_orderint, optional
The order of the differential matrix. Must be greater than 0. Default is 2 (second order differential matrix). Typical values are 2 or 1.
- max_iterint, optional
The max number of fit iterations. Default is 50.
- tolfloat, optional
The exit criteria. Default is 1e-3.
- weightsarray-like, shape (N,), optional
The weighting array. If None (default), then the initial weights will be an array with size equal to N and all values set to 1.
- x_dataarray-like, optional
The x-values. Not used by this function, but input is allowed for consistency with other functions.
- Returns:
- baselinenumpy.ndarray, shape (N,)
The calculated baseline.
- paramsdict
A dictionary with the following items:
- 'weights': numpy.ndarray, shape (N,)
The weight array used for fitting the data.
- 'tol_history': numpy.ndarray
An array containing the calculated tolerance values for each iteration. The length of the array is the number of iterations completed. If the last value in the array is greater than the input tol value, then the function did not converge.
References
Ye, J., et al. Baseline correction method based on improved asymmetrically reweighted penalized least squares for Raman spectrum. Applied Optics, 2020, 59, 10933-10943.
- pybaselines.whittaker.iasls(data, x_data=None, lam=1000000.0, p=0.01, lam_1=0.0001, max_iter=50, tol=0.001, weights=None, diff_order=2)[source]
Fits the baseline using the improved asymmetric least squares (IAsLS) algorithm.
The algorithm consideres both the first and second derivatives of the residual.
- Parameters:
- dataarray-like, shape (N,)
The y-values of the measured data, with N data points. Must not contain missing data (NaN) or Inf.
- x_dataarray-like, shape (N,), optional
The x-values of the measured data. Default is None, which will create an array from -1 to 1 with N points.
- lamfloat, optional
The smoothing parameter. Larger values will create smoother baselines. Default is 1e6.
- pfloat, optional
The penalizing weighting factor. Must be between 0 and 1. Values greater than the baseline will be given p weight, and values less than the baseline will be given p - 1 weight. Default is 1e-2.
- lam_1float, optional
The smoothing parameter for the first derivative of the residual. Default is 1e-4.
- max_iterint, optional
The max number of fit iterations. Default is 50.
- tolfloat, optional
The exit criteria. Default is 1e-3.
- weightsarray-like, shape (N,), optional
The weighting array. If None (default), then the initial weights will be set by fitting the data with a second order polynomial.
- diff_orderint, optional
The order of the differential matrix. Must be greater than 1. Default is 2 (second order differential matrix). Typical values are 2 or 3.
- Returns:
- baselinenumpy.ndarray, shape (N,)
The calculated baseline.
- paramsdict
A dictionary with the following items:
- 'weights': numpy.ndarray, shape (N,)
The weight array used for fitting the data.
- 'tol_history': numpy.ndarray
An array containing the calculated tolerance values for each iteration. The length of the array is the number of iterations completed. If the last value in the array is greater than the input tol value, then the function did not converge.
- Raises:
- ValueError
Raised if p is not between 0 and 1 or if diff_order is less than 2.
References
He, S., et al. Baseline correction for raman spectra using an improved asymmetric least squares method, Analytical Methods, 2014, 6(12), 4402-4407.
- pybaselines.whittaker.psalsa(data, lam=100000.0, p=0.5, k=None, diff_order=2, max_iter=50, tol=0.001, weights=None, x_data=None)[source]
Peaked Signal's Asymmetric Least Squares Algorithm (psalsa).
Similar to the asymmetric least squares (AsLS) algorithm, but applies an exponential decay weighting to values greater than the baseline to allow using a higher p value to better fit noisy data.
- Parameters:
- dataarray-like, shape (N,)
The y-values of the measured data, with N data points. Must not contain missing data (NaN) or Inf.
- lamfloat, optional
The smoothing parameter. Larger values will create smoother baselines. Default is 1e5.
- pfloat, optional
The penalizing weighting factor. Must be between 0 and 1. Values greater than the baseline will be given p weight, and values less than the baseline will be given p - 1 weight. Default is 0.5.
- kfloat, optional
A factor that controls the exponential decay of the weights for baseline values greater than the data. Should be approximately the height at which a value could be considered a peak. Default is None, which sets k to one-tenth of the standard deviation of the input data. A large k value will produce similar results to
asls()
.- diff_orderint, optional
The order of the differential matrix. Must be greater than 0. Default is 2 (second order differential matrix). Typical values are 2 or 1.
- max_iterint, optional
The max number of fit iterations. Default is 50.
- tolfloat, optional
The exit criteria. Default is 1e-3.
- weightsarray-like, shape (N,), optional
The weighting array. If None (default), then the initial weights will be an array with size equal to N and all values set to 1.
- x_dataarray-like, optional
The x-values. Not used by this function, but input is allowed for consistency with other functions.
- Returns:
- baselinenumpy.ndarray, shape (N,)
The calculated baseline.
- paramsdict
A dictionary with the following items:
- 'weights': numpy.ndarray, shape (N,)
The weight array used for fitting the data.
- 'tol_history': numpy.ndarray
An array containing the calculated tolerance values for each iteration. The length of the array is the number of iterations completed. If the last value in the array is greater than the input tol value, then the function did not converge.
- Raises:
- ValueError
Raised if p is not between 0 and 1.
Notes
The exit criteria for the original algorithm was to check whether the signs of the residuals do not change between two iterations, but the comparison of the l2 norms of the weight arrays between iterations is used instead to be more comparable to other Whittaker-smoothing-based algorithms.
References
Oller-Moreno, S., et al. Adaptive Asymmetric Least Squares baseline estimation for analytical instruments. 2014 IEEE 11th International Multi-Conference on Systems, Signals, and Devices, 2014, 1-5.