pybaselines.spline
Module Contents
Functions
Iteratively removes corner points and creates a Bezier spline from the remaining points. |
|
Iterative Reweighted Spline Quantile Regression (IRSQR). |
|
Considers the data as a mixture model composed of noise and peaks. |
|
A penalized spline version of the airPLS algorithm. |
|
A penalized spline version of the arPLS algorithm. |
|
A penalized spline version of the asymmetric least squares (AsLS) algorithm. |
|
A penalized spline version of the asPLS algorithm. |
|
A penalized spline version of the derpsalsa algorithm. |
|
A penalized spline version of the drPLS algorithm. |
|
A penalized spline version of the IarPLS algorithm. |
|
A penalized spline version of the IAsLS algorithm. |
|
A penalized spline version of the morphological penalized least squares (MPLS) algorithm. |
|
A penalized spline version of the psalsa algorithm. |
- pybaselines.spline.corner_cutting(data, x_data=None, max_iter=100)[source]
Iteratively removes corner points and creates a Bezier spline from the remaining points.
- Parameters:
- dataarray-like, shape (N,)
The y-values of the measured data, with N data points.
- x_dataarray-like, shape (N,), optional
The x-values of the measured data. Default is None, which will create an array from -1 to 1 with N points.
- max_iterint, optional
The maximum number of iterations to try to remove corner points. Default is 100. Typically all corner points are removed in 10 to 20 iterations.
- Returns:
- baselinenumpy.ndarray, shape (N,)
The calculated baseline.
- dict
An empty dictionary, just to match the output of all other algorithms.
References
Liu, Y.J., et al. A Concise Iterative Method with Bezier Technique for Baseline Construction. Analyst, 2015, 140(23), 7984-7996.
- pybaselines.spline.irsqr(data, lam=100, quantile=0.05, num_knots=100, spline_degree=3, diff_order=3, max_iter=100, tol=1e-06, weights=None, eps=None, x_data=None)[source]
Iterative Reweighted Spline Quantile Regression (IRSQR).
Fits the baseline using quantile regression with penalized splines.
- Parameters:
- dataarray-like, shape (N,)
The y-values of the measured data, with N data points. Must not contain missing data (NaN) or Inf.
- lamfloat, optional
The smoothing parameter. Larger values will create smoother baselines. Default is 1e6.
- quantilefloat, optional
The quantile at which to fit the baseline. Default is 0.05.
- num_knotsint, optional
The number of knots for the spline. Default is 100.
- spline_degreeint, optional
The degree of the spline. Default is 3, which is a cubic spline.
- diff_orderint, optional
The order of the differential matrix. Must be greater than 0. Default is 3 (third order differential matrix). Typical values are 3, 2, or 1.
- max_iterint, optional
The max number of fit iterations. Default is 100.
- tolfloat, optional
The exit criteria. Default is 1e-6.
- weightsarray-like, shape (N,), optional
The weighting array. If None (default), then the initial weights will be an array with size equal to N and all values set to 1.
- epsfloat, optional
A small value added to the square of the residual to prevent dividing by 0. Default is None, which uses the square of the maximum-absolute-value of the fit each iteration multiplied by 1e-6.
- x_dataarray-like, shape (N,), optional
The x-values of the measured data. Default is None, which will create an array from -1 to 1 with N points.
- Returns:
- baselinenumpy.ndarray, shape (N,)
The calculated baseline.
- paramsdict
A dictionary with the following items:
- 'weights': numpy.ndarray, shape (N,)
The weight array used for fitting the data.
- 'tol_history': numpy.ndarray
An array containing the calculated tolerance values for each iteration. The length of the array is the number of iterations completed. If the last value in the array is greater than the input tol value, then the function did not converge.
- Raises:
- ValueError
Raised if quantile is not between 0 and 1.
References
Han, Q., et al. Iterative Reweighted Quantile Regression Using Augmented Lagrangian Optimization for Baseline Correction. 2018 5th International Conference on Information Science and Control Engineering (ICISCE), 2018, 280-284.
- pybaselines.spline.mixture_model(data, lam=100000.0, p=0.01, num_knots=100, spline_degree=3, diff_order=3, max_iter=50, tol=0.001, weights=None, symmetric=False, num_bins=None, x_data=None)[source]
Considers the data as a mixture model composed of noise and peaks.
Weights are iteratively assigned by calculating the probability each value in the residual belongs to a normal distribution representing the noise.
- Parameters:
- dataarray-like, shape (N,)
The y-values of the measured data, with N data points. Must not contain missing data (NaN) or Inf.
- lamfloat, optional
The smoothing parameter. Larger values will create smoother baselines. Default is 1e5.
- pfloat, optional
The penalizing weighting factor. Must be between 0 and 1. Values greater than the baseline will be given p weight, and values less than the baseline will be given p - 1 weight. Used to set the initial weights before performing expectation-maximization. Default is 1e-2.
- num_knotsint, optional
The number of knots for the spline. Default is 100.
- spline_degreeint, optional
The degree of the spline. Default is 3, which is a cubic spline.
- diff_orderint, optional
The order of the differential matrix. Must be greater than 0. Default is 3 (third order differential matrix). Typical values are 2 or 3.
- max_iterint, optional
The max number of fit iterations. Default is 50.
- tolfloat, optional
The exit criteria. Default is 1e-3.
- weightsarray-like, shape (N,), optional
The weighting array. If None (default), then the initial weights will be an array with size equal to N and all values set to 1, and then two iterations of reweighted least-squares are performed to provide starting weights for the expectation-maximization of the mixture model.
- symmetricbool, optional
If False (default), the total mixture model will be composed of one normal distribution for the noise and one uniform distribution for positive non-noise residuals. If True, an additional uniform distribution will be added to the mixture model for negative non-noise residuals. Only need to set symmetric to True when peaks are both positive and negative.
- num_binsint, optional, deprecated
Deprecated since version 1.1.0:
num_bins
is deprecated since it is no longer necessary for performing the expectation-maximization and will be removed in pybaselines version 1.3.0.- x_dataarray-like, shape (N,), optional
The x-values of the measured data. Default is None, which will create an array from -1 to 1 with N points.
- Returns:
- baselinenumpy.ndarray, shape (N,)
The calculated baseline.
- paramsdict
A dictionary with the following items:
- 'weights': numpy.ndarray, shape (N,)
The weight array used for fitting the data.
- 'tol_history': numpy.ndarray
An array containing the calculated tolerance values for each iteration. The length of the array is the number of iterations completed. If the last value in the array is greater than the input tol value, then the function did not converge.
- Raises:
- ValueError
Raised if p is not between 0 and 1.
References
de Rooi, J., et al. Mixture models for baseline estimation. Chemometric and Intelligent Laboratory Systems, 2012, 117, 56-60.
Ghojogh, B., et al. Fitting A Mixture Distribution to Data: Tutorial. arXiv preprint arXiv:1901.06708, 2019.
- pybaselines.spline.pspline_airpls(data, lam=1000.0, num_knots=100, spline_degree=3, diff_order=2, max_iter=50, tol=0.001, weights=None, x_data=None)[source]
A penalized spline version of the airPLS algorithm.
- Parameters:
- dataarray-like, shape (N,)
The y-values of the measured data, with N data points. Must not contain missing data (NaN) or Inf.
- lamfloat, optional
The smoothing parameter. Larger values will create smoother baselines. Default is 1e3.
- num_knotsint, optional
The number of knots for the spline. Default is 100.
- spline_degreeint, optional
The degree of the spline. Default is 3, which is a cubic spline.
- diff_orderint, optional
The order of the differential matrix. Must be greater than 0. Default is 2 (second order differential matrix). Typical values are 2 or 1.
- max_iterint, optional
The max number of fit iterations. Default is 50.
- tolfloat, optional
The exit criteria. Default is 1e-3.
- weightsarray-like, shape (N,), optional
The weighting array. If None (default), then the initial weights will be an array with size equal to N and all values set to 1.
- x_dataarray-like, shape (N,), optional
The x-values of the measured data. Default is None, which will create an array from -1 to 1 with N points.
- Returns:
- baselinenumpy.ndarray, shape (N,)
The calculated baseline.
- paramsdict
A dictionary with the following items:
- 'weights': numpy.ndarray, shape (N,)
The weight array used for fitting the data.
- 'tol_history': numpy.ndarray
An array containing the calculated tolerance values for each iteration. The length of the array is the number of iterations completed. If the last value in the array is greater than the input tol value, then the function did not converge.
See also
References
Zhang, Z.M., et al. Baseline correction using adaptive iteratively reweighted penalized least squares. Analyst, 2010, 135(5), 1138-1146.
Eilers, P., et al. Splines, knots, and penalties. Wiley Interdisciplinary Reviews: Computational Statistics, 2010, 2(6), 637-653.
- pybaselines.spline.pspline_arpls(data, lam=1000.0, num_knots=100, spline_degree=3, diff_order=2, max_iter=50, tol=0.001, weights=None, x_data=None)[source]
A penalized spline version of the arPLS algorithm.
- Parameters:
- dataarray-like, shape (N,)
The y-values of the measured data, with N data points. Must not contain missing data (NaN) or Inf.
- lamfloat, optional
The smoothing parameter. Larger values will create smoother baselines. Default is 1e3.
- num_knotsint, optional
The number of knots for the spline. Default is 100.
- spline_degreeint, optional
The degree of the spline. Default is 3, which is a cubic spline.
- diff_orderint, optional
The order of the differential matrix. Must be greater than 0. Default is 2 (second order differential matrix). Typical values are 2 or 1.
- max_iterint, optional
The max number of fit iterations. Default is 50.
- tolfloat, optional
The exit criteria. Default is 1e-3.
- weightsarray-like, shape (N,), optional
The weighting array. If None (default), then the initial weights will be an array with size equal to N and all values set to 1.
- x_dataarray-like, shape (N,), optional
The x-values of the measured data. Default is None, which will create an array from -1 to 1 with N points.
- Returns:
- baselinenumpy.ndarray, shape (N,)
The calculated baseline.
- paramsdict
A dictionary with the following items:
- 'weights': numpy.ndarray, shape (N,)
The weight array used for fitting the data.
- 'tol_history': numpy.ndarray
An array containing the calculated tolerance values for each iteration. The length of the array is the number of iterations completed. If the last value in the array is greater than the input tol value, then the function did not converge.
See also
References
Baek, S.J., et al. Baseline correction using asymmetrically reweighted penalized least squares smoothing. Analyst, 2015, 140, 250-257.
Eilers, P., et al. Splines, knots, and penalties. Wiley Interdisciplinary Reviews: Computational Statistics, 2010, 2(6), 637-653.
- pybaselines.spline.pspline_asls(data, lam=1000.0, p=0.01, num_knots=100, spline_degree=3, diff_order=2, max_iter=50, tol=0.001, weights=None, x_data=None)[source]
A penalized spline version of the asymmetric least squares (AsLS) algorithm.
- Parameters:
- dataarray-like, shape (N,)
The y-values of the measured data, with N data points. Must not contain missing data (NaN) or Inf.
- lamfloat, optional
The smoothing parameter. Larger values will create smoother baselines. Default is 1e3.
- pfloat, optional
The penalizing weighting factor. Must be between 0 and 1. Values greater than the baseline will be given p weight, and values less than the baseline will be given p - 1 weight. Default is 1e-2.
- num_knotsint, optional
The number of knots for the spline. Default is 100.
- spline_degreeint, optional
The degree of the spline. Default is 3, which is a cubic spline.
- diff_orderint, optional
The order of the differential matrix. Must be greater than 0. Default is 2 (second order differential matrix). Typical values are 2 or 1.
- max_iterint, optional
The max number of fit iterations. Default is 50.
- tolfloat, optional
The exit criteria. Default is 1e-3.
- weightsarray-like, shape (N,), optional
The weighting array. If None (default), then the initial weights will be an array with size equal to N and all values set to 1.
- x_dataarray-like, shape (N,), optional
The x-values of the measured data. Default is None, which will create an array from -1 to 1 with N points.
- Returns:
- baselinenumpy.ndarray, shape (N,)
The calculated baseline.
- paramsdict
A dictionary with the following items:
- 'weights': numpy.ndarray, shape (N,)
The weight array used for fitting the data.
- 'tol_history': numpy.ndarray
An array containing the calculated tolerance values for each iteration. The length of the array is the number of iterations completed. If the last value in the array is greater than the input tol value, then the function did not converge.
- Raises:
- ValueError
Raised if p is not between 0 and 1.
See also
References
Eilers, P. A Perfect Smoother. Analytical Chemistry, 2003, 75(14), 3631-3636.
Eilers, P., et al. Baseline correction with asymmetric least squares smoothing. Leiden University Medical Centre Report, 2005, 1(1).
Eilers, P., et al. Splines, knots, and penalties. Wiley Interdisciplinary Reviews: Computational Statistics, 2010, 2(6), 637-653.
- pybaselines.spline.pspline_aspls(data, lam=10000.0, num_knots=100, spline_degree=3, diff_order=2, max_iter=100, tol=0.001, weights=None, alpha=None, x_data=None)[source]
A penalized spline version of the asPLS algorithm.
- Parameters:
- dataarray-like, shape (N,)
The y-values of the measured data, with N data points. Must not contain missing data (NaN) or Inf.
- lamfloat, optional
The smoothing parameter. Larger values will create smoother baselines. Default is 1e3.
- num_knotsint, optional
The number of knots for the spline. Default is 100.
- spline_degreeint, optional
The degree of the spline. Default is 3, which is a cubic spline.
- diff_orderint, optional
The order of the differential matrix. Must be greater than 0. Default is 2 (second order differential matrix). Typical values are 2 or 1.
- max_iterint, optional
The max number of fit iterations. Default is 100.
- tolfloat, optional
The exit criteria. Default is 1e-3.
- weightsarray-like, shape (N,), optional
The weighting array. If None (default), then the initial weights will be an array with size equal to N and all values set to 1.
- alphaarray-like, shape (N,), optional
An array of values that control the local value of lam to better fit peak and non-peak regions. If None (default), then the initial values will be an array with size equal to N and all values set to 1.
- x_dataarray-like, shape (N,), optional
The x-values of the measured data. Default is None, which will create an array from -1 to 1 with N points.
- Returns:
- baselinenumpy.ndarray, shape (N,)
The calculated baseline.
- paramsdict
A dictionary with the following items:
- 'weights': numpy.ndarray, shape (N,)
The weight array used for fitting the data.
- 'alpha': numpy.ndarray, shape (N,)
The array of alpha values used for fitting the data in the final iteration.
- 'tol_history': numpy.ndarray
An array containing the calculated tolerance values for each iteration. The length of the array is the number of iterations completed. If the last value in the array is greater than the input tol value, then the function did not converge.
See also
Notes
The weighting uses an asymmetric coefficient (k in the asPLS paper) of 0.5 instead of the 2 listed in the asPLS paper. pybaselines uses the factor of 0.5 since it matches the results in Table 2 and Figure 5 of the asPLS paper closer than the factor of 2 and fits noisy data much better.
References
Zhang, F., et al. Baseline correction for infrared spectra using adaptive smoothness parameter penalized least squares method. Spectroscopy Letters, 2020, 53(3), 222-233.
Eilers, P., et al. Splines, knots, and penalties. Wiley Interdisciplinary Reviews: Computational Statistics, 2010, 2(6), 637-653.
- pybaselines.spline.pspline_derpsalsa(data, lam=100.0, p=0.01, k=None, num_knots=100, spline_degree=3, diff_order=2, max_iter=50, tol=0.001, weights=None, smooth_half_window=None, num_smooths=16, x_data=None, **pad_kwargs)[source]
A penalized spline version of the derpsalsa algorithm.
- Parameters:
- dataarray-like, shape (N,)
The y-values of the measured data, with N data points. Must not contain missing data (NaN) or Inf.
- lamfloat, optional
The smoothing parameter. Larger values will create smoother baselines. Default is 1e2.
- pfloat, optional
The penalizing weighting factor. Must be between 0 and 1. Values greater than the baseline will be given p weight, and values less than the baseline will be given p - 1 weight. Default is 1e-2.
- kfloat, optional
A factor that controls the exponential decay of the weights for baseline values greater than the data. Should be approximately the height at which a value could be considered a peak. Default is None, which sets k to one-tenth of the standard deviation of the input data. A large k value will produce similar results to
asls()
.- num_knotsint, optional
The number of knots for the spline. Default is 100.
- spline_degreeint, optional
The degree of the spline. Default is 3, which is a cubic spline.
- diff_orderint, optional
The order of the differential matrix. Must be greater than 0. Default is 2 (second order differential matrix). Typical values are 2 or 1.
- max_iterint, optional
The max number of fit iterations. Default is 50.
- tolfloat, optional
The exit criteria. Default is 1e-3.
- weightsarray-like, shape (N,), optional
The weighting array. If None (default), then the initial weights will be an array with size equal to N and all values set to 1.
- smooth_half_windowint, optional
The half-window to use for smoothing the data before computing the first and second derivatives. Default is None, which will use
len(data) / 200
.- num_smoothsint, optional
The number of times to smooth the data before computing the first and second derivatives. Default is 16.
- x_dataarray-like, shape (N,), optional
The x-values of the measured data. Default is None, which will create an array from -1 to 1 with N points.
- **pad_kwargs
Additional keyword arguments to pass to
pad_edges()
for padding the edges of the data to prevent edge effects from smoothing.
- Returns:
- baselinenumpy.ndarray, shape (N,)
The calculated baseline.
- paramsdict
A dictionary with the following items:
- 'weights': numpy.ndarray, shape (N,)
The weight array used for fitting the data.
- 'tol_history': numpy.ndarray
An array containing the calculated tolerance values for each iteration. The length of the array is the number of iterations completed. If the last value in the array is greater than the input tol value, then the function did not converge.
- Raises:
- ValueError
Raised if p is not between 0 and 1.
See also
References
Korepanov, V. Asymmetric least-squares baseline algorithm with peak screening for automatic processing of the Raman spectra. Journal of Raman Spectroscopy. 2020, 51(10), 2061-2065.
Eilers, P., et al. Splines, knots, and penalties. Wiley Interdisciplinary Reviews: Computational Statistics, 2010, 2(6), 637-653.
- pybaselines.spline.pspline_drpls(data, lam=1000.0, eta=0.5, num_knots=100, spline_degree=3, diff_order=2, max_iter=50, tol=0.001, weights=None, x_data=None)[source]
A penalized spline version of the drPLS algorithm.
- Parameters:
- dataarray-like, shape (N,)
The y-values of the measured data, with N data points. Must not contain missing data (NaN) or Inf.
- lamfloat, optional
The smoothing parameter. Larger values will create smoother baselines. Default is 1e3.
- etafloat
A term for controlling the value of lam; should be between 0 and 1. Low values will produce smoother baselines, while higher values will more aggressively fit peaks. Default is 0.5.
- num_knotsint, optional
The number of knots for the spline. Default is 100.
- spline_degreeint, optional
The degree of the spline. Default is 3, which is a cubic spline.
- diff_orderint, optional
The order of the differential matrix. Must be greater than 0. Default is 2 (second order differential matrix). Typical values are 2 or 1.
- max_iterint, optional
The max number of fit iterations. Default is 50.
- tolfloat, optional
The exit criteria. Default is 1e-3.
- weightsarray-like, shape (N,), optional
The weighting array. If None (default), then the initial weights will be an array with size equal to N and all values set to 1.
- x_dataarray-like, shape (N,), optional
The x-values of the measured data. Default is None, which will create an array from -1 to 1 with N points.
- Returns:
- baselinenumpy.ndarray, shape (N,)
The calculated baseline.
- paramsdict
A dictionary with the following items:
- 'weights': numpy.ndarray, shape (N,)
The weight array used for fitting the data.
- 'tol_history': numpy.ndarray
An array containing the calculated tolerance values for each iteration. The length of the array is the number of iterations completed. If the last value in the array is greater than the input tol value, then the function did not converge.
- Raises:
- ValueError
Raised if eta is not between 0 and 1 or if diff_order is less than 2.
See also
References
Xu, D. et al. Baseline correction method based on doubly reweighted penalized least squares, Applied Optics, 2019, 58, 3913-3920.
Eilers, P., et al. Splines, knots, and penalties. Wiley Interdisciplinary Reviews: Computational Statistics, 2010, 2(6), 637-653.
- pybaselines.spline.pspline_iarpls(data, lam=1000.0, num_knots=100, spline_degree=3, diff_order=2, max_iter=50, tol=0.001, weights=None, x_data=None)[source]
A penalized spline version of the IarPLS algorithm.
- Parameters:
- dataarray-like, shape (N,)
The y-values of the measured data, with N data points. Must not contain missing data (NaN) or Inf.
- lamfloat, optional
The smoothing parameter. Larger values will create smoother baselines. Default is 1e3.
- num_knotsint, optional
The number of knots for the spline. Default is 100.
- spline_degreeint, optional
The degree of the spline. Default is 3, which is a cubic spline.
- diff_orderint, optional
The order of the differential matrix. Must be greater than 0. Default is 2 (second order differential matrix). Typical values are 2 or 1.
- max_iterint, optional
The max number of fit iterations. Default is 50.
- tolfloat, optional
The exit criteria. Default is 1e-3.
- weightsarray-like, shape (N,), optional
The weighting array. If None (default), then the initial weights will be an array with size equal to N and all values set to 1.
- x_dataarray-like, shape (N,), optional
The x-values of the measured data. Default is None, which will create an array from -1 to 1 with N points.
- Returns:
- baselinenumpy.ndarray, shape (N,)
The calculated baseline.
- paramsdict
A dictionary with the following items:
- 'weights': numpy.ndarray, shape (N,)
The weight array used for fitting the data.
- 'tol_history': numpy.ndarray
An array containing the calculated tolerance values for each iteration. The length of the array is the number of iterations completed. If the last value in the array is greater than the input tol value, then the function did not converge.
See also
References
Ye, J., et al. Baseline correction method based on improved asymmetrically reweighted penalized least squares for Raman spectrum. Applied Optics, 2020, 59, 10933-10943.
Eilers, P., et al. Splines, knots, and penalties. Wiley Interdisciplinary Reviews: Computational Statistics, 2010, 2(6), 637-653.
- pybaselines.spline.pspline_iasls(data, x_data=None, lam=10.0, p=0.01, lam_1=0.0001, num_knots=100, spline_degree=3, max_iter=50, tol=0.001, weights=None, diff_order=2)[source]
A penalized spline version of the IAsLS algorithm.
- Parameters:
- dataarray-like, shape (N,)
The y-values of the measured data, with N data points. Must not contain missing data (NaN) or Inf.
- x_dataarray-like, shape (N,), optional
The x-values of the measured data. Default is None, which will create an array from -1 to 1 with N points.
- lamfloat, optional
The smoothing parameter. Larger values will create smoother baselines. Default is 1e1.
- pfloat, optional
The penalizing weighting factor. Must be between 0 and 1. Values greater than the baseline will be given p weight, and values less than the baseline will be given p - 1 weight. Default is 1e-2.
- lam_1float, optional
The smoothing parameter for the first derivative of the residual. Default is 1e-4.
- num_knotsint, optional
The number of knots for the spline. Default is 100.
- spline_degreeint, optional
The degree of the spline. Default is 3, which is a cubic spline.
- max_iterint, optional
The max number of fit iterations. Default is 50.
- tolfloat, optional
The exit criteria. Default is 1e-3.
- weightsarray-like, shape (N,), optional
The weighting array. If None (default), then the initial weights will be an array with size equal to N and all values set to 1.
- diff_orderint, optional
The order of the differential matrix. Must be greater than 1. Default is 2 (second order differential matrix). Typical values are 2 or 3.
- Returns:
- baselinenumpy.ndarray, shape (N,)
The calculated baseline.
- paramsdict
A dictionary with the following items:
- 'weights': numpy.ndarray, shape (N,)
The weight array used for fitting the data.
- 'tol_history': numpy.ndarray
An array containing the calculated tolerance values for each iteration. The length of the array is the number of iterations completed. If the last value in the array is greater than the input tol value, then the function did not converge.
- Raises:
- ValueError
Raised if p is not between 0 and 1 or if diff_order is less than 2.
See also
References
He, S., et al. Baseline correction for raman spectra using an improved asymmetric least squares method, Analytical Methods, 2014, 6(12), 4402-4407.
Eilers, P., et al. Splines, knots, and penalties. Wiley Interdisciplinary Reviews: Computational Statistics, 2010, 2(6), 637-653.
- pybaselines.spline.pspline_mpls(data, x_data=None, half_window=None, lam=1000.0, p=0.0, num_knots=100, spline_degree=3, diff_order=2, tol=0.001, max_iter=50, weights=None, **window_kwargs)[source]
A penalized spline version of the morphological penalized least squares (MPLS) algorithm.
- Parameters:
- dataarray-like, shape (N,)
The y-values of the measured data, with N data points.
- x_dataarray-like, shape (N,), optional
The x-values of the measured data. Default is None, which will create an array from -1 to 1 with N points.
- half_windowint, optional
The half-window used for the morphology functions. If a value is input, then that value will be used. Default is None, which will optimize the half-window size using
optimize_window()
and window_kwargs.- lamfloat, optional
The smoothing parameter. Larger values will create smoother baselines. Default is 1e3.
- pfloat, optional
The penalizing weighting factor. Must be between 0 and 1. Anchor points identified by the procedure in [1] are given a weight of 1 - p, and all other points have a weight of p. Default is 0.0.
- num_knotsint, optional
The number of knots for the spline. Default is 100.
- spline_degreeint, optional
The degree of the spline. Default is 3, which is a cubic spline.
- diff_orderint, optional
The order of the differential matrix. Must be greater than 0. Default is 2 (second order differential matrix). Typical values are 2 or 1.
- max_iterint, optional
The max number of fit iterations. Default is 50.
- tolfloat, optional
The exit criteria. Default is 1e-3.
- weightsarray-like, shape (N,), optional
The weighting array. If None (default), then the weights will be calculated following the procedure in [1].
- **window_kwargs
Values for setting the half window used for the morphology operations. Items include:
- 'increment': int
The step size for iterating half windows. Default is 1.
- 'max_hits': int
The number of consecutive half windows that must produce the same morphological opening before accepting the half window as the optimum value. Default is 1.
- 'window_tol': float
The tolerance value for considering two morphological openings as equivalent. Default is 1e-6.
- 'max_half_window': int
The maximum allowable window size. If None (default), will be set to (len(data) - 1) / 2.
- 'min_half_window': int
The minimum half-window size. If None (default), will be set to 1.
- Returns:
- baselinenumpy.ndarray, shape (N,)
The calculated baseline.
- paramsdict
A dictionary with the following items:
- 'weights': numpy.ndarray, shape (N,)
The weight array used for fitting the data.
- 'half_window': int
The half window used for the morphological calculations.
- Raises:
- ValueError
Raised if p is not between 0 and 1.
References
Eilers, P., et al. Splines, knots, and penalties. Wiley Interdisciplinary Reviews: Computational Statistics, 2010, 2(6), 637-653.
- pybaselines.spline.pspline_psalsa(data, lam=1000.0, p=0.5, k=None, num_knots=100, spline_degree=3, diff_order=2, max_iter=50, tol=0.001, weights=None, x_data=None)[source]
A penalized spline version of the psalsa algorithm.
- Parameters:
- dataarray-like, shape (N,)
The y-values of the measured data, with N data points. Must not contain missing data (NaN) or Inf.
- lamfloat, optional
The smoothing parameter. Larger values will create smoother baselines. Default is 1e3.
- pfloat, optional
The penalizing weighting factor. Must be between 0 and 1. Values greater than the baseline will be given p weight, and values less than the baseline will be given p - 1 weight. Default is 0.5.
- kfloat, optional
A factor that controls the exponential decay of the weights for baseline values greater than the data. Should be approximately the height at which a value could be considered a peak. Default is None, which sets k to one-tenth of the standard deviation of the input data. A large k value will produce similar results to
asls()
.- num_knotsint, optional
The number of knots for the spline. Default is 100.
- spline_degreeint, optional
The degree of the spline. Default is 3, which is a cubic spline.
- diff_orderint, optional
The order of the differential matrix. Must be greater than 0. Default is 2 (second order differential matrix). Typical values are 2 or 1.
- max_iterint, optional
The max number of fit iterations. Default is 50.
- tolfloat, optional
The exit criteria. Default is 1e-3.
- weightsarray-like, shape (N,), optional
The weighting array. If None (default), then the initial weights will be an array with size equal to N and all values set to 1.
- x_dataarray-like, shape (N,), optional
The x-values of the measured data. Default is None, which will create an array from -1 to 1 with N points.
- Returns:
- baselinenumpy.ndarray, shape (N,)
The calculated baseline.
- paramsdict
A dictionary with the following items:
- 'weights': numpy.ndarray, shape (N,)
The weight array used for fitting the data.
- 'tol_history': numpy.ndarray
An array containing the calculated tolerance values for each iteration. The length of the array is the number of iterations completed. If the last value in the array is greater than the input tol value, then the function did not converge.
- Raises:
- ValueError
Raised if p is not between 0 and 1.
See also
References
Oller-Moreno, S., et al. Adaptive Asymmetric Least Squares baseline estimation for analytical instruments. 2014 IEEE 11th International Multi-Conference on Systems, Signals, and Devices, 2014, 1-5.
Eilers, P., et al. Splines, knots, and penalties. Wiley Interdisciplinary Reviews: Computational Statistics, 2010, 2(6), 637-653.