pybaselines.smooth
Module Contents
Functions
Iterative Polynomial Smoothing Algorithm (IPSA). |
|
The noise-median method for baseline identification. |
|
Range Independent Algorithm (RIA). |
|
Statistics-sensitive Non-linear Iterative Peak-clipping (SNIP). |
|
Small-window moving average (SWiMA) baseline. |
- pybaselines.smooth.ipsa(data, half_window=None, max_iter=500, tol=None, roi=None, original_criteria=False, x_data=None, **pad_kwargs)[source]
Iterative Polynomial Smoothing Algorithm (IPSA).
- Parameters:
- dataarray-like, shape (N,)
The y-values of the measured data, with N data points.
- half_windowint
The half-window to use for the smoothing each iteration. Should be approximately equal to the full-width-at-half-maximum of the peaks or features in the data. Default is None, which will use 4 times the output of
optimize_window()
, which is not always a good value, but at least scales with the number of data points and gives a starting point for tuning the parameter.- max_iterint, optional
The maximum number of iterations. Default is 500.
- tolfloat, optional
The exit criteria. Default is None, which uses 1e-3 if original_criteria is False, and
1 / (max(data) - min(data))
if original_criteria is True.- roislice or array-like, shape(N,)
The region of interest, such that
np.asarray(data)[roi]
gives the values for calculating the tolerance if original_criteria is True. Not used if original_criteria is True. Default is None, which uses all values in data.- original_criteriabool, optional
Whether to use the original exit criteria from the reference, which is difficult to use since it requires knowledge of how high the peaks should be after baseline correction. If False (default), then compares
norm(old, new) / norm(old)
, where old is the previous iteration's baseline, and new is the current iteration's baseline.- x_dataarray-like, optional
The x-values. Not used by this function, but input is allowed for consistency with other functions.
- **pad_kwargs
Additional keyword arguments to pass to
pad_edges()
for padding the edges of the data to prevent edge effects from convolution.
- Returns:
- baselinenumpy.ndarray, shape (N,)
The calculated baseline.
- paramsdict
A dictionary with the following items:
- 'tol_history': numpy.ndarray
An array containing the calculated tolerance values for each iteration. The length of the array is the number of iterations completed. If the last value in the array is greater than the input tol value, then the function did not converge.
References
Wang, T., et al. Background Subtraction of Raman Spectra Based on Iterative Polynomial Smoothing. Applied Spectroscopy. 71(6) (2017) 1169-1179.
- pybaselines.smooth.noise_median(data, half_window=None, smooth_half_window=None, sigma=None, x_data=None, **pad_kwargs)[source]
The noise-median method for baseline identification.
Assumes the baseline can be considered as the median value within a moving window, and the resulting baseline is then smoothed with a Gaussian kernel.
- Parameters:
- dataarray-like, shape (N,)
The y-values of the measured data, with N data points.
- half_windowint, optional
The index-based size to use for the median window. The total window size will range from [-half_window, ..., half_window] with size 2 * half_window + 1. Default is None, which will use twice the output from
optimize_window()
, which is an okay starting value.- smooth_half_windowint, optional
The half window to use for smoothing. Default is None, which will use the same value as half_window.
- sigmafloat, optional
The standard deviation of the smoothing Gaussian kernel. Default is None, which will use (2 * smooth_half_window + 1) / 6.
- x_dataarray-like, optional
The x-values. Not used by this function, but input is allowed for consistency with other functions.
- **pad_kwargs
Additional keyword arguments to pass to
pad_edges()
for padding the edges of the data to prevent edge effects from convolution.
- Returns:
- baselinenumpy.ndarray, shape (N,)
The calculated and smoothed baseline.
- dict
An empty dictionary, just to match the output of all other algorithms.
References
Friedrichs, M., A model-free algorithm for the removal of baseline artifacts. J. Biomolecular NMR, 1995, 5, 147-153.
- pybaselines.smooth.ria(data, x_data=None, half_window=None, max_iter=500, tol=0.01, side='both', width_scale=0.1, height_scale=1.0, sigma_scale=1.0 / 12.0, **pad_kwargs)[source]
Range Independent Algorithm (RIA).
Adds additional data to the left and/or right of the input data, and then iteratively smooths until the area of the additional data is removed.
- Parameters:
- dataarray-like, shape (N,)
The y-values of the measured data, with N data points.
- x_dataarray-like, shape (N,), optional
The x-values of the measured data. Default is None, which will create an array from -1 to 1 with N points.
- half_windowint, optional
The half-window to use for the smoothing each iteration. Should be approximately equal to the full-width-at-half-maximum of the peaks or features in the data. Default is None, which will use the output of
optimize_window()
, which is not always a good value, but at least scales with the number of data points and gives a starting point for tuning the parameter.- max_iterint, optional
The maximum number of iterations. Default is 500.
- tolfloat, optional
The exit criteria. Default is 1e-2.
- side{'both', 'left', 'right'}, optional
The side of the measured data to extend. Default is 'both'.
- width_scalefloat, optional
The number of data points added to each side is width_scale * N. Default is 0.1.
- height_scalefloat, optional
The height of the added Gaussian peak(s) is calculated as height_scale * max(data). Default is 1.
- sigma_scalefloat, optional
The sigma value for the added Gaussian peak(s) is calculated as sigma_scale * width_scale * N. Default is 1/12, which will make the Gaussian span +- 6 sigma, making its total width about half of the added length.
- **pad_kwargs
Additional keyword arguments to pass to
pad_edges()
for padding the edges of the data when adding the extended left and/or right sections.
- Returns:
- baselinenumpy.ndarray, shape (N,)
The calculated baseline.
- paramsdict
A dictionary with the following items:
- 'tol_history': numpy.ndarray
An array containing the calculated tolerance values for each iteration. The length of the array is the number of iterations completed. If the last value in the array is greater than the input tol value, then the function did not converge (if the array length is equal to max_iter) or the areas of the smoothed extended regions exceeded their initial areas (if the array length is < max_iter).
- Raises:
- ValueError
Raised if side is not 'left', 'right', or 'both'.
References
Krishna, H., et al. Range-independent background subtraction algorithm for recovery of Raman spectra of biological tissue. J Raman Spectroscopy. 43(12) (2012) 1884-1894.
- pybaselines.smooth.snip(data, max_half_window=None, decreasing=False, smooth_half_window=None, filter_order=2, x_data=None, **pad_kwargs)[source]
Statistics-sensitive Non-linear Iterative Peak-clipping (SNIP).
- Parameters:
- dataarray-like, shape (N,)
The y-values of the measured data, with N data points.
- max_half_windowint or Sequence(int, int), optional
The maximum number of iterations. Should be set such that max_half_window is approxiamtely
(w-1)/2
, wherew
is the index-based width of a feature or peak. max_half_window can also be a sequence of two integers for asymmetric peaks, with the first item corresponding to the max_half_window of the peak's left edge, and the second item for the peak's right edge [3]. Default is None, which will use the output fromoptimize_window()
, which is an okay starting value.- decreasingbool, optional
If False (default), will iterate through window sizes from 1 to max_half_window. If True, will reverse the order and iterate from max_half_window to 1, which gives a smoother baseline according to [3] and [4].
- smooth_half_windowint, optional
The half window to use for smoothing the data. If smooth_half_window is greater than 0, will perform a moving average smooth on the data for each window, which gives better results for noisy data [3]. Default is None, which will not perform any smoothing.
- filter_order{2, 4, 6, 8}, optional
If the measured data has a more complicated baseline consisting of other elements such as Compton edges, then a higher filter_order should be selected [3]. Default is 2, which works well for approximating a linear baseline.
- x_dataarray-like, optional
The x-values. Not used by this function, but input is allowed for consistency with other functions.
- **pad_kwargs
Additional keyword arguments to pass to
pad_edges()
for padding the edges of the data to prevent edge effects from convolution.
- Returns:
- baselinenumpy.ndarray, shape (N,)
The calculated baseline.
- dict
An empty dictionary, just to match the output of all other algorithms.
- Raises:
- ValueError
Raised if filter_order is not 2, 4, 6, or 8.
- Warns:
- UserWarning
Raised if max_half_window is greater than (len(data) - 1) // 2.
Notes
Algorithm initially developed by [1], and this specific version of the algorithm is adapted from [2], [3], and [4].
If data covers several orders of magnitude, better results can be obtained by first transforming the data using log-log-square transform before using SNIP [2]:
transformed_data = np.log(np.log(np.sqrt(data + 1) + 1) + 1)
and then baseline can then be reverted back to the original scale using inverse:
baseline = -1 + (np.exp(np.exp(snip(transformed_data)) - 1) - 1)**2
References
[1]Ryan, C.G., et al. SNIP, A Statistics-Sensitive Background Treatment For The Quantitative Analysis Of Pixe Spectra In Geoscience Applications. Nuclear Instruments and Methods in Physics Research B, 1988, 934, 396-402.
[2] (1,2)Morháč, M., et al. Background elimination methods for multidimensional coincidence γ-ray spectra. Nuclear Instruments and Methods in Physics Research A, 1997, 401, 113-132.
- pybaselines.smooth.swima(data, min_half_window=3, max_half_window=None, smooth_half_window=None, x_data=None, **pad_kwargs)[source]
Small-window moving average (SWiMA) baseline.
Computes an iterative moving average to smooth peaks and obtain the baseline.
- Parameters:
- dataarray-like, shape (N,)
The y-values of the measured data, with N data points.
- min_half_windowint, optional
The minimum half window value that must be reached before the exit criteria is considered. Can be increased to reduce the calculation time. Default is 3.
- max_half_windowint, optional
The maximum number of iterations. Default is None, which will use (N - 1) / 2. Typically does not need to be specified.
- smooth_half_windowint, optional
The half window to use for smoothing the input data with a moving average. Default is None, which will use N / 50. Use a value of 0 or less to not smooth the data. See Notes below for more details.
- x_dataarray-like, optional
The x-values. Not used by this function, but input is allowed for consistency with other functions.
- **pad_kwargs
Additional keyword arguments to pass to
pad_edges()
for padding the edges of the data to prevent edge effects from convolution.
- Returns:
- baselinenumpy.ndarray, shape (N,)
The calculated baseline.
- dict
A dictionary with the following items:
- 'half_window': list(int)
A list of the half windows at which the exit criteria was reached. Has a length of 1 if the main exit criteria was intially reached, otherwise has a length of 2.
- 'converged': list(bool or None)
A list of the convergence status. Has a length of 1 if the main exit criteria was intially reached, otherwise has a length of 2. Each convergence status is True if the main exit criteria was reached, False if the second exit criteria was reached, and None if max_half_window is reached before either exit criteria.
Notes
This algorithm requires the input data to be fairly smooth (noise-free), so it is recommended to either smooth the data beforehand, or specify a smooth_half_window value. Non-smooth data can cause the exit criteria to be reached prematurely (can be avoided by setting a larger min_half_window), while over-smoothed data can cause the exit criteria to be reached later than optimal.
The half-window at which convergence occurs is roughly close to the index-based full-width-at-half-maximum of a peak or feature, but can vary. Therfore, it is better to set a min_half_window that is smaller than expected to not miss the exit criteria.
If the main exit criteria is not reached on the initial fit, a gaussian baseline (which is well handled by this algorithm) is added to the data, and it is re-fit.
References
Schulze, H., et al. A Small-Window Moving Average-Based Fully Automated Baseline Estimation Method for Raman Spectra. Applied Spectroscopy, 2012, 66(7), 757-764.