pybaselines.misc
Module Contents
Functions
Baseline estimation and denoising with sparsity (BEADS). 

Creates a baseline by interpolating through input points. 
 pybaselines.misc.beads(data, freq_cutoff=0.005, lam_0=1.0, lam_1=1.0, lam_2=1.0, asymmetry=6.0, filter_type=1, cost_function=2, max_iter=50, tol=0.01, eps_0=1e06, eps_1=1e06, fit_parabola=True, smooth_half_window=None, x_data=None)[source]
Baseline estimation and denoising with sparsity (BEADS).
Decomposes the input data into baseline and pure, noisefree signal by modeling the baseline as a low pass filter and by considering the signal and its derivatives as sparse [4].
 Parameters:
 dataarraylike, shape (N,)
The yvalues of the measured data, with N data points.
 freq_cutofffloat, optional
The cutoff frequency of the high pass filter, normalized such that 0 < freq_cutoff < 0.5. Default is 0.005.
 lam_0float, optional
The regularization parameter for the signal values. Default is 1.0. Higher values give a higher penalty.
 lam_1float, optional
The regularization parameter for the first derivative of the signal. Default is 1.0. Higher values give a higher penalty.
 lam_2float, optional
The regularization parameter for the second derivative of the signal. Default is 1.0. Higher values give a higher penalty.
 asymmetryfloat, optional
A number greater than 0 that determines the weighting of negative values compared to positive values in the cost function. Default is 6.0, which gives negative values six times more impact on the cost function that positive values. Set to 1 for a symmetric cost function, or a value less than 1 to weigh positive values more.
 filter_typeint, optional
An integer describing the high pass filter type. The order of the high pass filter is
2 * filter_type
. Default is 1 (second order filter). cost_function{2, 1, "l1_v1", "l1_v2"}, optional
An integer or string indicating which approximation of the l1 (absolute value) penalty to use. 1 or "l1_v1" will use \(l(x) = \sqrt{x^2 + \text{eps_1}}\) and 2 (default) or "l1_v2" will use \(l(x) = x  \text{eps_1}\log{(x + \text{eps_1})}\).
 max_iterint, optional
The maximum number of iterations. Default is 50.
 tolfloat, optional
The exit criteria. Default is 1e2.
 eps_0float, optional
The cutoff threshold between absolute loss and quadratic loss. Values in the signal with absolute value less than eps_0 will have quadratic loss. Default is 1e6.
 eps_1float, optional
A small, positive value used to prevent issues when the first or second order derivatives are close to zero. Default is 1e6.
 fit_parabolabool, optional
If True (default), will fit a parabola to the data and subtract it before performing the beads fit as suggested in [5]. This ensures the endpoints of the fit data are close to 0, which is required by beads. If the data is already close to 0 on both endpoints, set fit_parabola to False.
 smooth_half_windowint, optional
The halfwindow to use for smoothing the derivatives of the data with a moving average and full window size of 2 * smooth_half_window + 1. Smoothing can improve the convergence of the calculation, and make the calculation less sensitive to small changes in lam_1 and lam_2, as noted in the pybeads package [6]. Default is None, which will not perform any smoothing.
 x_dataarraylike, optional
The xvalues. Not used by this function, but input is allowed for consistency with other functions.
 Returns:
 baselinenumpy.ndarray, shape (N,)
The calculated baseline.
 paramsdict
A dictionary with the following items:
 'signal': numpy.ndarray, shape (N,)
The pure signal portion of the input data without noise or the baseline.
 'tol_history': numpy.ndarray
An array containing the calculated tolerance values for each iteration. The length of the array is the number of iterations completed. If the last value in the array is greater than the input tol value, then the function did not converge.
 Raises:
 ValueError
Raised if asymmetry is less than 0.
Notes
The default lam_0, lam_1, and lam_2 values are good starting points for a dataset with 1000 points. Typically, smaller values are needed for larger datasets and larger values for smaller datasets.
When finding the best parameters for fitting, it is usually best to find the optimal freq_cutoff for the noise in the data before adjusting any other parameters since it has the largest effect [5].
References
[4]Ning, X., et al. Chromatogram baseline estimation and denoising using sparsity (BEADS). Chemometrics and Intelligent Laboratory Systems, 2014, 139, 156167.
 pybaselines.misc.interp_pts(x_data, baseline_points=(), interp_method='linear', data=None)[source]
Creates a baseline by interpolating through input points.
 Parameters:
 x_dataarraylike, shape (N,)
The xvalues of the measured data.
 baseline_pointsarraylike, shape (n, 2)
An array of ((x_1, y_1), (x_2, y_2), ..., (x_n, y_n)) values for each point representing the baseline.
 interp_methodstr, optional
The method to use for interpolation. See
scipy.interpolate.interp1d
for all options. Default is 'linear', which connects each point with a line segment. dataarraylike, optional
The yvalues. Not used by this function, but input is allowed for consistency with other functions.
 Returns:
 baselinenumpy.ndarray, shape (N,)
The baseline array constructed from interpolating between each input baseline point.
 dict
An empty dictionary, just to match the output of all other algorithms.
Notes
This method is only suggested for use within userinterfaces.
Regions of the baseline where x_data is less than the minimum xvalue or greater than the maximum xvalue in baseline_points will be assigned values of 0.