pybaselines.morphological

Module Contents

Functions

amormol

Iteratively averaging morphological and mollified (aMorMol) baseline.

imor

An Improved Morphological based (IMor) baseline algorithm.

jbcd

Joint Baseline Correction and Denoising (jbcd) Algorithm.

mor

A Morphological based (Mor) baseline algorithm.

mormol

Iterative morphological and mollified (MorMol) baseline.

mpls

The Morphological penalized least squares (MPLS) baseline algorithm.

mpspline

Morphology-based penalized spline baseline.

mwmv

Moving window minimum value (MWMV) baseline.

rolling_ball

The rolling ball baseline algorithm.

tophat

Estimates the baseline using a top-hat transformation (morphological opening).

pybaselines.morphological.amormol(data, half_window=None, tol=0.001, max_iter=200, pad_kwargs=None, x_data=None, **window_kwargs)[source]

Iteratively averaging morphological and mollified (aMorMol) baseline.

Parameters:
dataarray-like, shape (N,)

The y-values of the measured data, with N data points.

half_windowint, optional

The half-window used for the morphology functions. If a value is input, then that value will be used. Default is None, which will optimize the half-window size using optimize_window() and window_kwargs.

tolfloat, optional

The exit criteria. Default is 1e-3.

max_iterint, optional

The maximum number of iterations. Default is 200.

pad_kwargsdict, optional

A dictionary of keyword arguments to pass to pad_edges() for padding the edges of the data to prevent edge effects from convolution.

x_dataarray-like, optional

The x-values. Not used by this function, but input is allowed for consistency with other functions.

**window_kwargs

Values for setting the half window used for the morphology operations. Items include:

  • 'increment': int

    The step size for iterating half windows. Default is 1.

  • 'max_hits': int

    The number of consecutive half windows that must produce the same morphological opening before accepting the half window as the optimum value. Default is 1.

  • 'window_tol': float

    The tolerance value for considering two morphological openings as equivalent. Default is 1e-6.

  • 'max_half_window': int

    The maximum allowable window size. If None (default), will be set to (len(data) - 1) / 2.

  • 'min_half_window': int

    The minimum half-window size. If None (default), will be set to 1.

Returns:
baselinenumpy.ndarray, shape (N,)

The calculated baseline.

paramsdict

A dictionary with the following items:

  • 'half_window': int

    The half window used for the morphological calculations.

  • 'tol_history': numpy.ndarray

    An array containing the calculated tolerance values for each iteration. The length of the array is the number of iterations completed. If the last value in the array is greater than the input tol value, then the function did not converge.

References

Chen, H., et al. An Adaptive and Fully Automated Baseline Correction Method for Raman Spectroscopy Based on Morphological Operations and Mollifications. Applied Spectroscopy, 2019, 73(3), 284-293.

pybaselines.morphological.imor(data, half_window=None, tol=0.001, max_iter=200, x_data=None, **window_kwargs)[source]

An Improved Morphological based (IMor) baseline algorithm.

Parameters:
dataarray-like, shape (N,)

The y-values of the measured data, with N data points.

half_windowint, optional

The half-window used for the morphology functions. If a value is input, then that value will be used. Default is None, which will optimize the half-window size using optimize_window() and window_kwargs.

tolfloat, optional

The exit criteria. Default is 1e-3.

max_iterint, optional

The maximum number of iterations. Default is 200.

x_dataarray-like, optional

The x-values. Not used by this function, but input is allowed for consistency with other functions.

**window_kwargs

Values for setting the half window used for the morphology operations. Items include:

  • 'increment': int

    The step size for iterating half windows. Default is 1.

  • 'max_hits': int

    The number of consecutive half windows that must produce the same morphological opening before accepting the half window as the optimum value. Default is 1.

  • 'window_tol': float

    The tolerance value for considering two morphological openings as equivalent. Default is 1e-6.

  • 'max_half_window': int

    The maximum allowable window size. If None (default), will be set to (len(data) - 1) / 2.

  • 'min_half_window': int

    The minimum half-window size. If None (default), will be set to 1.

Returns:
baselinenumpy.ndarray, shape (N,)

The calculated baseline.

paramsdict

A dictionary with the following items:

  • 'half_window': int

    The half window used for the morphological calculations.

  • 'tol_history': numpy.ndarray

    An array containing the calculated tolerance values for each iteration. The length of the array is the number of iterations completed. If the last value in the array is greater than the input tol value, then the function did not converge.

References

Dai, L., et al. An Automated Baseline Correction Method Based on Iterative Morphological Operations. Applied Spectroscopy, 2018, 72(5), 731-739.

pybaselines.morphological.jbcd(data, half_window=None, alpha=0.1, beta=10.0, gamma=1.0, beta_mult=1.1, gamma_mult=0.909, diff_order=1, max_iter=20, tol=0.01, tol_2=0.001, robust_opening=True, x_data=None, **window_kwargs)[source]

Joint Baseline Correction and Denoising (jbcd) Algorithm.

Parameters:
dataarray-like, shape (N,)

The y-values of the measured data, with N data points.

half_windowint, optional

The half-window used for the morphology functions. If a value is input, then that value will be used. Default is None, which will optimize the half-window size using optimize_window() and window_kwargs.

alphafloat, optional

The regularization parameter that controls how close the baseline must fit the calculated morphological opening. Larger values make the fit more constrained to the opening and can make the baseline less smooth. Default is 0.1.

betafloat, optional

The regularization parameter that controls how smooth the baseline is. Larger values produce smoother baselines. Default is 1e1.

gammafloat, optional

The regularization parameter that controls how smooth the signal is. Larger values produce smoother baselines. Default is 1.

beta_multfloat, optional

The value that beta is multiplied by each iteration. Default is 1.1.

gamma_multfloat, optional

The value that gamma is multiplied by each iteration. Default is 0.909.

diff_orderint, optional

The order of the differential matrix. Must be greater than 0. Default is 1 (first order differential matrix). Typical values are 2 or 1.

max_iterint, optional

The maximum number of iterations. Default is 20.

tolfloat, optional

The exit criteria for the change in the calculated signal. Default is 1e-2.

tol_2float, optional

The exit criteria for the change in the calculated baseline. Default is 1e-2.

robust_openingbool, optional

If True (default), the opening used to represent the initial baseline is the element-wise minimum between the morphological opening and the average of the morphological erosion and dilation of the opening, similar to mor(). If False, the opening is just the morphological opening, as used in the reference. The robust opening typically represents the baseline better.

x_dataarray-like, optional

The x-values. Not used by this function, but input is allowed for consistency with other functions.

**window_kwargs

Values for setting the half window used for the morphology operations. Items include:

  • 'increment': int

    The step size for iterating half windows. Default is 1.

  • 'max_hits': int

    The number of consecutive half windows that must produce the same morphological opening before accepting the half window as the optimum value. Default is 1.

  • 'window_tol': float

    The tolerance value for considering two morphological openings as equivalent. Default is 1e-6.

  • 'max_half_window': int

    The maximum allowable window size. If None (default), will be set to (len(data) - 1) / 2.

  • 'min_half_window': int

    The minimum half-window size. If None (default), will be set to 1.

Returns:
baselinenumpy.ndarray, shape (N,)

The calculated baseline.

paramsdict

A dictionary with the following items:

  • 'half_window': int

    The half window used for the morphological calculations.

  • 'tol_history': numpy.ndarray, shape (K, 2)

    An array containing the calculated tolerance values for each iteration. Index 0 are the tolerence values for the relative change in the signal, and index 1 are the tolerance values for the relative change in the baseline. The length of the array is the number of iterations completed, K. If the last values in the array are greater than the input tol or tol_2 values, then the function did not converge.

  • 'signal': numpy.ndarray, shape (N,)

    The pure signal portion of the input data without noise or the baseline.

References

Liu, H., et al. Joint Baseline-Correction and Denoising for Raman Spectra. Applied Spectroscopy, 2015, 69(9), 1013-1022.

pybaselines.morphological.mor(data, half_window=None, x_data=None, **window_kwargs)[source]

A Morphological based (Mor) baseline algorithm.

Parameters:
dataarray-like, shape (N,)

The y-values of the measured data, with N data points.

half_windowint, optional

The half-window used for the morphology functions. If a value is input, then that value will be used. Default is None, which will optimize the half-window size using optimize_window() and window_kwargs.

x_dataarray-like, optional

The x-values. Not used by this function, but input is allowed for consistency with other functions.

**window_kwargs

Values for setting the half window used for the morphology operations. Items include:

  • 'increment': int

    The step size for iterating half windows. Default is 1.

  • 'max_hits': int

    The number of consecutive half windows that must produce the same morphological opening before accepting the half window as the optimum value. Default is 1.

  • 'window_tol': float

    The tolerance value for considering two morphological openings as equivalent. Default is 1e-6.

  • 'max_half_window': int

    The maximum allowable window size. If None (default), will be set to (len(data) - 1) / 2.

  • 'min_half_window': int

    The minimum half-window size. If None (default), will be set to 1.

Returns:
baselinenumpy.ndarray, shape (N,)

The calculated baseline.

dict

A dictionary with the following items:

  • 'half_window': int

    The half window used for the morphological calculations.

References

Perez-Pueyo, R., et al. Morphology-Based Automated Baseline Removal for Raman Spectra of Artistic Pigments. Applied Spectroscopy, 2010, 64, 595-600.

pybaselines.morphological.mormol(data, half_window=None, tol=0.001, max_iter=250, smooth_half_window=None, pad_kwargs=None, x_data=None, **window_kwargs)[source]

Iterative morphological and mollified (MorMol) baseline.

Parameters:
dataarray-like, shape (N,)

The y-values of the measured data, with N data points.

half_windowint, optional

The half-window used for the morphology functions. If a value is input, then that value will be used. Default is None, which will optimize the half-window size using optimize_window() and window_kwargs.

tolfloat, optional

The exit criteria. Default is 1e-3.

max_iterint, optional

The maximum number of iterations. Default is 200.

smooth_half_windowint, optional

The half-window to use for smoothing the data before performing the morphological operation. Default is None, which will use a value of 1, which gives no smoothing.

pad_kwargsdict, optional

A dictionary of keyword arguments to pass to pad_edges() for padding the edges of the data to prevent edge effects from convolution.

x_dataarray-like, optional

The x-values. Not used by this function, but input is allowed for consistency with other functions.

**window_kwargs

Values for setting the half window used for the morphology operations. Items include:

  • 'increment': int

    The step size for iterating half windows. Default is 1.

  • 'max_hits': int

    The number of consecutive half windows that must produce the same morphological opening before accepting the half window as the optimum value. Default is 1.

  • 'window_tol': float

    The tolerance value for considering two morphological openings as equivalent. Default is 1e-6.

  • 'max_half_window': int

    The maximum allowable window size. If None (default), will be set to (len(data) - 1) / 2.

  • 'min_half_window': int

    The minimum half-window size. If None (default), will be set to 1.

Returns:
baselinenumpy.ndarray, shape (N,)

The calculated baseline.

paramsdict

A dictionary with the following items:

  • 'half_window': int

    The half window used for the morphological calculations.

  • 'tol_history': numpy.ndarray

    An array containing the calculated tolerance values for each iteration. The length of the array is the number of iterations completed. If the last value in the array is greater than the input tol value, then the function did not converge.

References

Koch, M., et al. Iterative morphological and mollifier-based baseline correction for Raman spectra. J Raman Spectroscopy, 2017, 48(2), 336-342.

pybaselines.morphological.mpls(data, half_window=None, lam=1000000.0, p=0.0, diff_order=2, tol=0.001, max_iter=50, weights=None, x_data=None, **window_kwargs)[source]

The Morphological penalized least squares (MPLS) baseline algorithm.

Parameters:
dataarray-like, shape (N,)

The y-values of the measured data, with N data points.

half_windowint, optional

The half-window used for the morphology functions. If a value is input, then that value will be used. Default is None, which will optimize the half-window size using optimize_window() and window_kwargs.

lamfloat, optional

The smoothing parameter. Larger values will create smoother baselines. Default is 1e6.

pfloat, optional

The penalizing weighting factor. Must be between 0 and 1. Anchor points identified by the procedure in [1] are given a weight of 1 - p, and all other points have a weight of p. Default is 0.0.

diff_orderint, optional

The order of the differential matrix. Must be greater than 0. Default is 2 (second order differential matrix). Typical values are 2 or 1.

max_iterint, optional

The max number of fit iterations. Default is 50.

tolfloat, optional

The exit criteria. Default is 1e-3.

weightsarray-like, shape (N,), optional

The weighting array. If None (default), then the weights will be calculated following the procedure in [1].

x_dataarray-like, optional

The x-values. Not used by this function, but input is allowed for consistency with other functions.

**window_kwargs

Values for setting the half window used for the morphology operations. Items include:

  • 'increment': int

    The step size for iterating half windows. Default is 1.

  • 'max_hits': int

    The number of consecutive half windows that must produce the same morphological opening before accepting the half window as the optimum value. Default is 1.

  • 'window_tol': float

    The tolerance value for considering two morphological openings as equivalent. Default is 1e-6.

  • 'max_half_window': int

    The maximum allowable window size. If None (default), will be set to (len(data) - 1) / 2.

  • 'min_half_window': int

    The minimum half-window size. If None (default), will be set to 1.

Returns:
baselinenumpy.ndarray, shape (N,)

The calculated baseline.

paramsdict

A dictionary with the following items:

  • 'weights': numpy.ndarray, shape (N,)

    The weight array used for fitting the data.

  • 'half_window': int

    The half window used for the morphological calculations.

Raises:
ValueError

Raised if p is not between 0 and 1.

References

[1] (1,2)

Li, Zhong, et al. Morphological weighted penalized least squares for background correction. Analyst, 2013, 138, 4483-4492.

pybaselines.morphological.mpspline(data, half_window=None, lam=10000.0, lam_smooth=0.01, p=0.0, num_knots=100, spline_degree=3, diff_order=2, weights=None, pad_kwargs=None, x_data=None, **window_kwargs)[source]

Morphology-based penalized spline baseline.

Identifies baseline points using morphological operations, and then uses weighted least-squares to fit a penalized spline to the baseline.

Parameters:
dataarray-like, shape (N,)

The y-values of the measured data, with N data points.

half_windowint, optional

The half-window used for the morphology functions. If a value is input, then that value will be used. Default is None, which will optimize the half-window size using optimize_window() and window_kwargs.

lamfloat, optional

The smoothing parameter for the penalized spline when fitting the baseline. Larger values will create smoother baselines. Default is 1e4. Larger values are needed for larger num_knots.

lam_smoothfloat, optional

The smoothing parameter for the penalized spline when smoothing the input data. Default is 1e-2. Larger values are needed for noisy data or for larger num_knots.

pfloat, optional

The penalizing weighting factor. Must be between 0 and 1. Anchor points identified by the procedure in the reference are given a weight of 1 - p, and all other points have a weight of p. Default is 0.0.

num_knotsint, optional

The number of knots for the spline. Default is 100.

spline_degreeint, optional

The degree of the spline. Default is 3, which is a cubic spline.

diff_orderint, optional

The order of the differential matrix. Must be greater than 0. Default is 2 (second order differential matrix). Typical values are 2 or 3.

weightsarray-like, shape (N,), optional

The weighting array. If None (default), then the weights will be calculated following the procedure in the reference.

x_dataarray-like, shape (N,), optional

The x-values of the measured data. Default is None, which will create an array from -1 to 1 with N points.

**window_kwargs

Values for setting the half window used for the morphology operations. Items include:

  • 'increment': int

    The step size for iterating half windows. Default is 1.

  • 'max_hits': int

    The number of consecutive half windows that must produce the same morphological opening before accepting the half window as the optimum value. Default is 1.

  • 'window_tol': float

    The tolerance value for considering two morphological openings as equivalent. Default is 1e-6.

  • 'max_half_window': int

    The maximum allowable window size. If None (default), will be set to (len(data) - 1) / 2.

  • 'min_half_window': int

    The minimum half-window size. If None (default), will be set to 1.

Returns:
baselinenumpy.ndarray, shape (N,)

The calculated baseline.

paramsdict

A dictionary with the following items:

  • 'weights': numpy.ndarray, shape (N,)

    The weight array used for fitting the data.

  • 'half_window': int

    The half window used for the morphological calculations.

Raises:
ValueError

Raised if half_window is < 1, if lam or lam_smooth is <= 0, or if p is not between 0 and 1.

Notes

The optimal opening is calculated as the element-wise minimum of the opening and the average of the erosion and dilation of the opening. The reference used the erosion and dilation of the smoothed data, rather than the opening, which tends to overestimate the baseline.

Rather than setting knots at the intersection points of the optimal opening and the smoothed data as described in the reference, weights are assigned to 1 - p at the intersection points and p elsewhere. This simplifies the penalized spline calculation by allowing the use of equally spaced knots, but should otherwise give similar results as the reference algorithm.

References

Gonzalez-Vidal, J., et al. Automatic morphology-based cubic p-spline fitting methodology for smoothing and baseline-removal of Raman spectra. Journal of Raman Spectroscopy. 2017, 48(6), 878-883.

pybaselines.morphological.mwmv(data, half_window=None, smooth_half_window=None, pad_kwargs=None, x_data=None, **window_kwargs)[source]

Moving window minimum value (MWMV) baseline.

Parameters:
dataarray-like, shape (N,)

The y-values of the measured data, with N data points.

half_windowint, optional

The half-window used for the morphology functions. If a value is input, then that value will be used. Default is None, which will optimize the half-window size using optimize_window() and window_kwargs.

smooth_half_windowint, optional

The half-window to use for smoothing the data after performing the morphological operation. Default is None, which will use the same value as used for the morphological operation.

pad_kwargsdict, optional

A dictionary of keyword arguments to pass to pad_edges() for padding the edges of the data to prevent edge effects from the moving average.

x_dataarray-like, optional

The x-values. Not used by this function, but input is allowed for consistency with other functions.

**window_kwargs

Values for setting the half window used for the morphology operations. Items include:

  • 'increment': int

    The step size for iterating half windows. Default is 1.

  • 'max_hits': int

    The number of consecutive half windows that must produce the same morphological opening before accepting the half window as the optimum value. Default is 1.

  • 'window_tol': float

    The tolerance value for considering two morphological openings as equivalent. Default is 1e-6.

  • 'max_half_window': int

    The maximum allowable window size. If None (default), will be set to (len(data) - 1) / 2.

  • 'min_half_window': int

    The minimum half-window size. If None (default), will be set to 1.

Returns:
baselinenumpy.ndarray, shape (N,)

The calculated baseline.

dict

A dictionary with the following items:

  • 'half_window': int

    The half window used for the morphological calculations.

Notes

Performs poorly when baseline is rapidly changing.

References

Yaroshchyk, P., et al. Automatic correction of continuum background in Laser-induced Breakdown Spectroscopy using a model-free algorithm. Spectrochimica Acta Part B, 2014, 99, 138-149.

pybaselines.morphological.rolling_ball(data, half_window=None, smooth_half_window=None, pad_kwargs=None, x_data=None, **window_kwargs)[source]

The rolling ball baseline algorithm.

Applies a minimum and then maximum moving window, and subsequently smooths the result, giving a baseline that resembles rolling a ball across the data.

Parameters:
dataarray-like, shape (N,)

The y-values of the measured data, with N data points.

half_windowint, optional

The half-window used for the morphology functions. If a value is input, then that value will be used. Default is None, which will optimize the half-window size using optimize_window() and window_kwargs.

smooth_half_windowint, optional

The half-window to use for smoothing the data after performing the morphological operation. Default is None, which will use the same value as used for the morphological operation.

pad_kwargsdict, optional

A dictionary of keyword arguments to pass to pad_edges() for padding the edges of the data to prevent edge effects from the moving average.

x_dataarray-like, optional

The x-values. Not used by this function, but input is allowed for consistency with other functions.

**window_kwargs

Values for setting the half window used for the morphology operations. Items include:

  • 'increment': int

    The step size for iterating half windows. Default is 1.

  • 'max_hits': int

    The number of consecutive half windows that must produce the same morphological opening before accepting the half window as the optimum value. Default is 1.

  • 'window_tol': float

    The tolerance value for considering two morphological openings as equivalent. Default is 1e-6.

  • 'max_half_window': int

    The maximum allowable window size. If None (default), will be set to (len(data) - 1) / 2.

  • 'min_half_window': int

    The minimum half-window size. If None (default), will be set to 1.

Returns:
baselinenumpy.ndarray, shape (N,)

The calculated baseline.

dict

A dictionary with the following items:

  • 'half_window': int

    The half window used for the morphological calculations.

References

Kneen, M.A., et al. Algorithm for fitting XRF, SEM and PIXE X-ray spectra backgrounds. Nuclear Instruments and Methods in Physics Research B, 1996, 109, 209-213.

Liland, K., et al. Optimal Choice of Baseline Correction for Multivariate Calibration of Spectra. Applied Spectroscopy, 2010, 64(9), 1007-1016.

pybaselines.morphological.tophat(data, half_window=None, x_data=None, **window_kwargs)[source]

Estimates the baseline using a top-hat transformation (morphological opening).

Parameters:
dataarray-like, shape (N,)

The y-values of the measured data, with N data points.

half_windowint, optional

The half-window used for the morphological opening. If a value is input, then that value will be used. Default is None, which will optimize the half-window size using optimize_window() and window_kwargs.

x_dataarray-like, optional

The x-values. Not used by this function, but input is allowed for consistency with other functions.

**window_kwargs

Values for setting the half window used for the morphology operations. Items include:

  • 'increment': int

    The step size for iterating half windows. Default is 1.

  • 'max_hits': int

    The number of consecutive half windows that must produce the same morphological opening before accepting the half window as the optimum value. Default is 1.

  • 'window_tol': float

    The tolerance value for considering two morphological openings as equivalent. Default is 1e-6.

  • 'max_half_window': int

    The maximum allowable window size. If None (default), will be set to (len(data) - 1) / 2.

  • 'min_half_window': int

    The minimum half-window size. If None (default), will be set to 1.

Returns:
baselinenumpy.ndarray, shape (N,)

The calculated baseline.

dict

A dictionary with the following items:

  • 'half_window': int

    The half window used for the morphological calculations.

Notes

The actual top-hat transformation is defined as data - opening(data), where opening is the morphological opening operation. This function, however, returns opening(data), since that is technically the baseline defined by the operation.

References

Perez-Pueyo, R., et al. Morphology-Based Automated Baseline Removal for Raman Spectra of Artistic Pigments. Applied Spectroscopy, 2010, 64, 595-600.