pybaselines.Baseline2D.brpls
- Baseline2D.brpls(data, lam=1000.0, diff_order=2, max_iter=50, tol=0.001, max_iter_2=50, tol_2=0.001, weights=None, num_eigens=(10, 10), return_dof=False)[source]
Bayesian Reweighted Penalized Least Squares (BrPLS) baseline.
- Parameters:
- dataarray_like, shape (M, N)
The y-values of the measured data. Must not contain missing data (NaN) or Inf.
- lam
floator sequence[float,float], optional The smoothing parameter for the rows and columns, respectively. If a single value is given, both will use the same value. Larger values will create smoother baselines. Default is 1e3.
- diff_order
intor sequence[int,int], optional The order of the differential matrix for the rows and columns, respectively. If a single value is given, both will use the same value. Must be greater than 0. Default is 2 (second order differential matrix). Typical values are 2 or 1.
- max_iter
int, optional The max number of fit iterations. Default is 50.
- tol
float, optional The exit criteria. Default is 1e-3.
- max_iter_2
float, optional The number of iterations for updating the proportion of data occupied by peaks. Default is 50.
- tol_2
float, optional The exit criteria for the difference between the calculated proportion of data occupied by peaks. Default is 1e-3.
- weightsarray_like, shape (M, N), optional
The weighting array. If None (default), then the initial weights will be an array with shape equal to (M, N) and all values set to 1.
- num_eigens
intor sequence[int,int] orNone The number of eigenvalues for the rows and columns, respectively, to use for eigendecomposition. Typical values are between 5 and 30, with higher values needed for baselines with more curvature. If None, will solve the linear system using the full analytical solution, which is typically much slower. Must be greater than diff_order. Default is (10, 10).
- return_dofbool, optional
If True and num_eigens is not None, then the effective degrees of freedom for each eigenvector will be calculated and returned in the parameter dictionary. Default is False since the calculation takes time.
- Returns:
- baseline
numpy.ndarray, shape (M, N) The calculated baseline.
- params
dict A dictionary with the following items:
- 'weights': numpy.ndarray, shape (M, N)
The weight array used for fitting the data.
- 'tol_history': numpy.ndarray, shape (J, K)
An array containing the calculated tolerance values for each iteration of both threshold values and fit values. Index 0 are the tolerence values for the difference in the peak proportion, and indices >= 1 are the tolerance values for each fit. All values that were not used in fitting have values of 0. Shape J is 2 plus the number of iterations for the threshold to converge (related to max_iter_2, tol_2), and shape K is the maximum of the number of iterations for the threshold and the maximum number of iterations for all of the fits of the various threshold values (related to max_iter and tol).
- 'dof'numpy.ndarray, shape (num_eigens[0], num_eigens[1])
Only if return_dof is True. The effective degrees of freedom associated with each eigenvector. Lower values signify that the eigenvector was less important for the fit.
- baseline
References
Wang, Q., et al. Spectral baseline estimation using penalized least squares with weights derived from the Bayesian method. Nuclear Science and Techniques, 2022, 140, 250-257.
Biessy, G. Revisiting Whittaker-Henderson Smoothing. https://hal.science/hal-04124043 (Preprint), 2023.