pybaselines.misc

Module Contents

Functions

beads

Baseline estimation and denoising with sparsity (BEADS).

interp_pts

Creates a baseline by interpolating through input points.

pybaselines.misc.beads(data, freq_cutoff=0.005, lam_0=1.0, lam_1=1.0, lam_2=1.0, asymmetry=6.0, filter_type=1, cost_function=2, max_iter=50, tol=0.01, eps_0=1e-06, eps_1=1e-06, fit_parabola=True, smooth_half_window=None, x_data=None)[source]

Baseline estimation and denoising with sparsity (BEADS).

Decomposes the input data into baseline and pure, noise-free signal by modeling the baseline as a low pass filter and by considering the signal and its derivatives as sparse [4].

Parameters:
dataarray-like, shape (N,)

The y-values of the measured data, with N data points.

freq_cutofffloat, optional

The cutoff frequency of the high pass filter, normalized such that 0 < freq_cutoff < 0.5. Default is 0.005.

lam_0float, optional

The regularization parameter for the signal values. Default is 1.0. Higher values give a higher penalty.

lam_1float, optional

The regularization parameter for the first derivative of the signal. Default is 1.0. Higher values give a higher penalty.

lam_2float, optional

The regularization parameter for the second derivative of the signal. Default is 1.0. Higher values give a higher penalty.

asymmetryfloat, optional

A number greater than 0 that determines the weighting of negative values compared to positive values in the cost function. Default is 6.0, which gives negative values six times more impact on the cost function that positive values. Set to 1 for a symmetric cost function, or a value less than 1 to weigh positive values more.

filter_typeint, optional

An integer describing the high pass filter type. The order of the high pass filter is 2 * filter_type. Default is 1 (second order filter).

cost_function{2, 1, "l1_v1", "l1_v2"}, optional

An integer or string indicating which approximation of the l1 (absolute value) penalty to use. 1 or "l1_v1" will use \(l(x) = \sqrt{x^2 + \text{eps_1}}\) and 2 (default) or "l1_v2" will use \(l(x) = |x| - \text{eps_1}\log{(|x| + \text{eps_1})}\).

max_iterint, optional

The maximum number of iterations. Default is 50.

tolfloat, optional

The exit criteria. Default is 1e-2.

eps_0float, optional

The cutoff threshold between absolute loss and quadratic loss. Values in the signal with absolute value less than eps_0 will have quadratic loss. Default is 1e-6.

eps_1float, optional

A small, positive value used to prevent issues when the first or second order derivatives are close to zero. Default is 1e-6.

fit_parabolabool, optional

If True (default), will fit a parabola to the data and subtract it before performing the beads fit as suggested in [5]. This ensures the endpoints of the fit data are close to 0, which is required by beads. If the data is already close to 0 on both endpoints, set fit_parabola to False.

smooth_half_windowint, optional

The half-window to use for smoothing the derivatives of the data with a moving average and full window size of 2 * smooth_half_window + 1. Smoothing can improve the convergence of the calculation, and make the calculation less sensitive to small changes in lam_1 and lam_2, as noted in the pybeads package [6]. Default is None, which will not perform any smoothing.

x_dataarray-like, optional

The x-values. Not used by this function, but input is allowed for consistency with other functions.

Returns:
baselinenumpy.ndarray, shape (N,)

The calculated baseline.

paramsdict

A dictionary with the following items:

  • 'signal': numpy.ndarray, shape (N,)

    The pure signal portion of the input data without noise or the baseline.

  • 'tol_history': numpy.ndarray

    An array containing the calculated tolerance values for each iteration. The length of the array is the number of iterations completed. If the last value in the array is greater than the input tol value, then the function did not converge.

Raises:
ValueError

Raised if asymmetry is less than 0.

Notes

The default lam_0, lam_1, and lam_2 values are good starting points for a dataset with 1000 points. Typically, smaller values are needed for larger datasets and larger values for smaller datasets.

When finding the best parameters for fitting, it is usually best to find the optimal freq_cutoff for the noise in the data before adjusting any other parameters since it has the largest effect [5].

References

[4]

Ning, X., et al. Chromatogram baseline estimation and denoising using sparsity (BEADS). Chemometrics and Intelligent Laboratory Systems, 2014, 139, 156-167.

[5] (1,2)

Navarro-Huerta, J.A., et al. Assisted baseline subtraction in complex chromatograms using the BEADS algorithm. Journal of Chromatography A, 2017, 1507, 1-10.

pybaselines.misc.interp_pts(x_data, baseline_points=(), interp_method='linear', data=None)[source]

Creates a baseline by interpolating through input points.

Parameters:
x_dataarray-like, shape (N,)

The x-values of the measured data.

baseline_pointsarray-like, shape (n, 2)

An array of ((x_1, y_1), (x_2, y_2), ..., (x_n, y_n)) values for each point representing the baseline.

interp_methodstr, optional

The method to use for interpolation. See scipy.interpolate.interp1d for all options. Default is 'linear', which connects each point with a line segment.

dataarray-like, optional

The y-values. Not used by this function, but input is allowed for consistency with other functions.

Returns:
baselinenumpy.ndarray, shape (N,)

The baseline array constructed from interpolating between each input baseline point.

dict

An empty dictionary, just to match the output of all other algorithms.

Notes

This method is only suggested for use within user-interfaces.

Regions of the baseline where x_data is less than the minimum x-value or greater than the maximum x-value in baseline_points will be assigned values of 0.