pybaselines.misc.beads
- pybaselines.misc.beads(data, freq_cutoff=0.005, lam_0=1.0, lam_1=1.0, lam_2=1.0, asymmetry=6.0, filter_type=1, cost_function=2, max_iter=50, tol=0.01, eps_0=1e-06, eps_1=1e-06, fit_parabola=True, smooth_half_window=None, x_data=None)[source]
Baseline estimation and denoising with sparsity (BEADS).
Decomposes the input data into baseline and pure, noise-free signal by modeling the baseline as a low pass filter and by considering the signal and its derivatives as sparse [4].
- Parameters:
- dataarray_like, shape (N,)
The y-values of the measured data, with N data points.
- freq_cutoff
float, optional The cutoff frequency of the high pass filter, normalized such that 0 < freq_cutoff < 0.5. Default is 0.005.
- lam_0
float, optional The regularization parameter for the signal values. Default is 1.0. Higher values give a higher penalty.
- lam_1
float, optional The regularization parameter for the first derivative of the signal. Default is 1.0. Higher values give a higher penalty.
- lam_2
float, optional The regularization parameter for the second derivative of the signal. Default is 1.0. Higher values give a higher penalty.
- asymmetry
float, optional A number greater than 0 that determines the weighting of negative values compared to positive values in the cost function. Default is 6.0, which gives negative values six times more impact on the cost function that positive values. Set to 1 for a symmetric cost function, or a value less than 1 to weigh positive values more.
- filter_type
int, optional An integer describing the high pass filter type. The order of the high pass filter is
2 * filter_type. Default is 1 (second order filter).- cost_function{2, 1, "l1_v1", "l1_v2"}, optional
An integer or string indicating which approximation of the l1 (absolute value) penalty to use. 1 or "l1_v1" will use \(l(x) = \sqrt{x^2 + \text{eps_1}}\) and 2 (default) or "l1_v2" will use \(l(x) = |x| - \text{eps_1}\log{(|x| + \text{eps_1})}\).
- max_iter
int, optional The maximum number of iterations. Default is 50.
- tol
float, optional The exit criteria. Default is 1e-2.
- eps_0
float, optional The cutoff threshold between absolute loss and quadratic loss. Values in the signal with absolute value less than eps_0 will have quadratic loss. Default is 1e-6.
- eps_1
float, optional A small, positive value used to prevent issues when the first or second order derivatives are close to zero. Default is 1e-6.
- fit_parabolabool, optional
If True (default), will fit a parabola to the data and subtract it before performing the beads fit as suggested in [5]. This ensures the endpoints of the fit data are close to 0, which is required by beads. If the data is already close to 0 on both endpoints, set fit_parabola to False.
- smooth_half_window
int, optional The half-window to use for smoothing the derivatives of the data with a moving average and full window size of 2 * smooth_half_window + 1. Smoothing can improve the convergence of the calculation, and make the calculation less sensitive to small changes in lam_1 and lam_2, as noted in the pybeads package [6]. Default is None, which will not perform any smoothing.
- x_dataarray_like, optional
The x-values. Not used by this function, but input is allowed for consistency with other functions.
- Returns:
- baseline
numpy.ndarray, shape (N,) The calculated baseline.
- params
dict A dictionary with the following items:
- 'signal': numpy.ndarray, shape (N,)
The pure signal portion of the input data without noise or the baseline.
- 'tol_history': numpy.ndarray
An array containing the calculated tolerance values for each iteration. The length of the array is the number of iterations completed. If the last value in the array is greater than the input tol value, then the function did not converge.
- baseline
- Raises:
ValueErrorRaised if asymmetry is less than 0.
Notes
The default lam_0, lam_1, and lam_2 values are good starting points for a dataset with 1000 points. Typically, smaller values are needed for larger datasets and larger values for smaller datasets.
When finding the best parameters for fitting, it is usually best to find the optimal freq_cutoff for the noise in the data before adjusting any other parameters since it has the largest effect [5].
References
[4]Ning, X., et al. Chromatogram baseline estimation and denoising using sparsity (BEADS). Chemometrics and Intelligent Laboratory Systems, 2014, 139, 156-167.