pybaselines.Baseline2D.psalsa
- Baseline2D.psalsa(data, lam=100000.0, p=0.5, k=None, diff_order=2, max_iter=50, tol=0.001, weights=None, num_eigens=(10, 10), return_dof=False)[source]
Peaked Signal's Asymmetric Least Squares Algorithm (psalsa).
Similar to the asymmetric least squares (AsLS) algorithm, but applies an exponential decay weighting to values greater than the baseline to allow using a higher p value to better fit noisy data.
- Parameters:
- dataarray_like, shape (M, N)
The y-values of the measured data. Must not contain missing data (NaN) or Inf.
- lam
floator sequence[float,float], optional The smoothing parameter for the rows and columns, respectively. If a single value is given, both will use the same value. Larger values will create smoother baselines. Default is 1e5.
- p
float, optional The penalizing weighting factor. Must be between 0 and 1. Values greater than the baseline will be given p weight, and values less than the baseline will be given 1 - p weight. Default is 0.5.
- k
float, optional A factor that controls the exponential decay of the weights for baseline values greater than the data. Should be approximately the height at which a value could be considered a peak. Default is None, which sets k to one-tenth of the standard deviation of the input data. A large k value will produce similar results to
asls().- diff_order
intor sequence[int,int], optional The order of the differential matrix for the rows and columns, respectively. If a single value is given, both will use the same value. Must be greater than 0. Default is 2 (second order differential matrix). Typical values are 2 or 1.
- max_iter
int, optional The max number of fit iterations. Default is 50.
- tol
float, optional The exit criteria. Default is 1e-3.
- weightsarray_like, shape (M, N), optional
The weighting array. If None (default), then the initial weights will be an array with shape equal to (M, N) and all values set to 1.
- num_eigens
intor sequence[int,int] orNone The number of eigenvalues for the rows and columns, respectively, to use for eigendecomposition. Typical values are between 5 and 30, with higher values needed for baselines with more curvature. If None, will solve the linear system using the full analytical solution, which is typically much slower. Must be greater than diff_order. Default is (10, 10).
- return_dofbool, optional
If True and num_eigens is not None, then the effective degrees of freedom for each eigenvector will be calculated and returned in the parameter dictionary. Default is False since the calculation takes time.
- Returns:
- baseline
numpy.ndarray, shape (M, N) The calculated baseline.
- params
dict A dictionary with the following items:
- 'weights': numpy.ndarray, shape (M, N)
The weight array used for fitting the data.
- 'tol_history': numpy.ndarray
An array containing the calculated tolerance values for each iteration. The length of the array is the number of iterations completed. If the last value in the array is greater than the input tol value, then the function did not converge.
- 'dof'numpy.ndarray, shape (num_eigens[0], num_eigens[1])
Only if return_dof is True. The effective degrees of freedom associated with each eigenvector. Lower values signify that the eigenvector was less important for the fit.
- baseline
- Raises:
ValueErrorRaised if p is not between 0 and 1. Also raised if k is not greater than 0.
Notes
The exit criteria for the original algorithm was to check whether the signs of the residuals do not change between two iterations, but the comparison of the l2 norms of the weight arrays between iterations is used instead to be more comparable to other Whittaker-smoothing-based algorithms.
References
Oller-Moreno, S., et al. Adaptive Asymmetric Least Squares baseline estimation for analytical instruments. 2014 IEEE 11th International Multi-Conference on Systems, Signals, and Devices, 2014, 1-5.
Biessy, G. Revisiting Whittaker-Henderson Smoothing. https://hal.science/hal-04124043 (Preprint), 2023.