pybaselines.two_d.api

Module Contents

Classes

Baseline2D

A class for all 2D baseline correction algorithms.

class pybaselines.two_d.api.Baseline2D(x_data=None, z_data=None, check_finite=True, assume_sorted=False, output_dtype=None)[source]

A class for all 2D baseline correction algorithms.

Contains all available 2D baseline correction algorithms in pybaselines as methods to allow a single interface for easier usage.

Parameters:
x_dataarray-like, shape (M,), optional

The x-values of the measured data. Default is None, which will create an array from -1 to 1 during the first function call with length equal to the input data length.

z_dataarray-like, shape (N,), optional

The z-values of the measured data. Default is None, which will create an array from -1 to 1 during the first function call with length equal to the input data length.

check_finitebool, optional

If True (default), will raise an error if any values in input data are not finite. Setting to False will skip the check. Note that errors may occur if check_finite is False and the input data contains non-finite values.

output_dtypetype or numpy.dtype, optional

The dtype to cast the output array. Default is None, which uses the typing of the input data.

Attributes:
poly_orderSequence[int, int]

The last polynomial order used for a polynomial algorithm. Initially is -1, denoting that no polynomial fitting has been performed.

psplinepybaselines.two_d._spline_utils.PSpline2D or None

The PSpline object for setting up and solving penalized spline algorithms. Is None if no penalized spline setup has been performed.

vandermondenumpy.ndarray or None

The Vandermonde matrix for solving polynomial equations. Is None if no polynomial setup has been performed.

whittaker_systempybaselines.two_d._banded_utils.PenalizedSystem2D or None

The PenalizedSystem object for setting up and solving Whittaker-smoothing-based algorithms. Is None if no Whittaker setup has been performed.

xnumpy.ndarray or None

The x-values for the object. If initialized with None, then x is initialized the first function call to have the same size as the input data.shape[-2] and has min and max values of -1 and 1, respectively.

x_domainnumpy.ndarray

The minimum and maximum values of x. If x_data is None during initialization, then set to numpy.ndarray([-1, 1]).

znumpy.ndarray or None

The z-values for the object. If initialized with None, then z is initialized the first function call to have the same size as the input data.shape[-1] and has min and max values of -1 and 1, respectively.

z_domainnumpy.ndarray

The minimum and maximum values of z. If z_data is None during initialization, then set to numpy.ndarray([-1, 1]).

adaptive_minmax(data, poly_order=None, method='modpoly', weights=None, constrained_fraction=0.01, constrained_weight=100000.0, estimation_poly_order=2, method_kwargs=None)

Fits polynomials of different orders and uses the maximum values as the baseline.

Each polynomial order fit is done both unconstrained and constrained at the endpoints.

Parameters:
dataarray-like, shape (M, N)

The y-values of the measured data.

poly_orderint or Sequence[int, int] or None, optional

The two polynomial orders to use for fitting. If a single integer is given, then will use the input value and one plus the input value. Default is None, which will do a preliminary fit using a polynomial of order estimation_poly_order and then select the appropriate polynomial orders according to [32].

method{'modpoly', 'imodpoly'}, optional

The method to use for fitting each polynomial. Default is 'modpoly'.

weightsarray-like, shape (M, N), optional

The weighting array. If None (default), then will be an array with shape equal to (M, N) and all values set to 1.

constrained_fractionfloat or Sequence[float, float], optional

The fraction of points at the left and right edges to use for the constrained fit. Default is 0.01. If constrained_fraction is a sequence, the first item is the fraction for the left edge and the second is the fraction for the right edge.

constrained_weightfloat or Sequence[float, float], optional

The weighting to give to the endpoints. Higher values ensure that the end points are fit, but can cause large fluctuations in the other sections of the polynomial. Default is 1e5. If constrained_weight is a sequence, the first item is the weight for the left edge and the second is the weight for the right edge.

estimation_poly_orderint, optional

The polynomial order used for estimating the baseline-to-signal ratio to select the appropriate polynomial orders if poly_order is None. Default is 2.

method_kwargsdict, optional

Additional keyword arguments to pass to modpoly() or imodpoly(). These include tol, max_iter, use_original, mask_initial_peaks, and num_std.

Returns:
numpy.ndarray, shape (M, N)

The calculated baseline.

paramsdict

A dictionary with the following items:

  • 'weights': numpy.ndarray, shape (M, N)

    The weight array used for fitting the data.

  • 'constrained_weights': numpy.ndarray, shape (M, N)

    The weight array used for the endpoint-constrained fits.

  • 'poly_order': numpy.ndarray, shape (2,)

    An array of the two polynomial orders used for the fitting.

References

[32]

Cao, A., et al. A robust method for automated background subtraction of tissue fluorescence. Journal of Raman Spectroscopy, 2007, 38, 1199-1205.

airpls(data, lam=1000000.0, diff_order=2, max_iter=50, tol=0.001, weights=None, num_eigens=(10, 10), return_dof=False)

Adaptive iteratively reweighted penalized least squares (airPLS) baseline.

Parameters:
dataarray-like, shape (M, N)

The y-values of the measured data. Must not contain missing data (NaN) or Inf.

lamfloat or Sequence[float, float], optional

The smoothing parameter for the rows and columns, respectively. If a single value is given, both will use the same value. Larger values will create smoother baselines. Default is 1e6.

diff_orderint or Sequence[int, int], optional

The order of the differential matrix for the rows and columns, respectively. If a single value is given, both will use the same value. Must be greater than 0. Default is 2 (second order differential matrix). Typical values are 2 or 1.

max_iterint, optional

The max number of fit iterations. Default is 50.

tolfloat, optional

The exit criteria. Default is 1e-3.

weightsarray-like, shape (M, N), optional

The weighting array. If None (default), then the initial weights will be an array with shape equal to (M, N) and all values set to 1.

num_eigensint or Sequence[int, int] or None

The number of eigenvalues for the rows and columns, respectively, to use for eigendecomposition. Typical values are between 5 and 30, with higher values needed for baselines with more curvature. If None, will solve the linear system using the full analytical solution, which is typically much slower. Default is (10, 10).

return_dofbool, optional

If True and num_eigens is not None, then the effective degrees of freedom for each eigenvector will be calculated and returned in the parameter dictionary. Default is False since the calculation takes time.

Returns:
baselinenumpy.ndarray, shape (M, N)

The calculated baseline.

paramsdict

A dictionary with the following items:

  • 'weights': numpy.ndarray, shape (M, N)

    The weight array used for fitting the data.

  • 'tol_history': numpy.ndarray

    An array containing the calculated tolerance values for each iteration. The length of the array is the number of iterations completed. If the last value in the array is greater than the input tol value, then the function did not converge.

  • 'dof'numpy.ndarray, shape (num_eigens[0], num_eigens[1])

    Only if return_dof is True. The effective degrees of freedom associated with each eigenvector. Lower values signify that the eigenvector was less important for the fit.

References

Zhang, Z.M., et al. Baseline correction using adaptive iteratively reweighted penalized least squares. Analyst, 2010, 135(5), 1138-1146.

Biessy, G. Revisiting Whittaker-Henderson Smoothing. https://hal.science/hal-04124043 (Preprint), 2023.

arpls(data, lam=1000.0, diff_order=2, max_iter=50, tol=0.001, weights=None, num_eigens=(10, 10), return_dof=False)

Asymmetrically reweighted penalized least squares smoothing (arPLS).

Parameters:
dataarray-like, shape (M, N)

The y-values of the measured data. Must not contain missing data (NaN) or Inf.

lamfloat or Sequence[float, float], optional

The smoothing parameter for the rows and columns, respectively. If a single value is given, both will use the same value. Larger values will create smoother baselines. Default is 1e3.

diff_orderint or Sequence[int, int], optional

The order of the differential matrix for the rows and columns, respectively. If a single value is given, both will use the same value. Must be greater than 0. Default is 2 (second order differential matrix). Typical values are 2 or 1.

max_iterint, optional

The max number of fit iterations. Default is 50.

tolfloat, optional

The exit criteria. Default is 1e-3.

weightsarray-like, shape (M, N), optional

The weighting array. If None (default), then the initial weights will be an array with shape equal to (M, N) and all values set to 1.

num_eigensint or Sequence[int, int] or None

The number of eigenvalues for the rows and columns, respectively, to use for eigendecomposition. Typical values are between 5 and 30, with higher values needed for baselines with more curvature. If None, will solve the linear system using the full analytical solution, which is typically much slower. Default is (10, 10).

return_dofbool, optional

If True and num_eigens is not None, then the effective degrees of freedom for each eigenvector will be calculated and returned in the parameter dictionary. Default is False since the calculation takes time.

Returns:
baselinenumpy.ndarray, shape (M, N)

The calculated baseline.

paramsdict

A dictionary with the following items:

  • 'weights': numpy.ndarray, shape (M, N)

    The weight array used for fitting the data.

  • 'tol_history': numpy.ndarray

    An array containing the calculated tolerance values for each iteration. The length of the array is the number of iterations completed. If the last value in the array is greater than the input tol value, then the function did not converge.

  • 'dof'numpy.ndarray, shape (num_eigens[0], num_eigens[1])

    Only if return_dof is True. The effective degrees of freedom associated with each eigenvector. Lower values signify that the eigenvector was less important for the fit.

References

Baek, S.J., et al. Baseline correction using asymmetrically reweighted penalized least squares smoothing. Analyst, 2015, 140, 250-257.

Biessy, G. Revisiting Whittaker-Henderson Smoothing. https://hal.science/hal-04124043 (Preprint), 2023.

asls(data, lam=1000000.0, p=0.01, diff_order=2, max_iter=50, tol=0.001, weights=None, num_eigens=(10, 10), return_dof=False)

Fits the baseline using asymmetric least squares (AsLS) fitting.

Parameters:
dataarray-like, shape (M, N)

The y-values of the measured data. Must not contain missing data (NaN) or Inf.

lamfloat or Sequence[float, float], optional

The smoothing parameter for the rows and columns, respectively. If a single value is given, both will use the same value. Larger values will create smoother baselines. Default is 1e6.

pfloat, optional

The penalizing weighting factor. Must be between 0 and 1. Values greater than the baseline will be given p weight, and values less than the baseline will be given p - 1 weight. Default is 1e-2.

diff_orderint or Sequence[int, int], optional

The order of the differential matrix for the rows and columns, respectively. If a single value is given, both will use the same value. Must be greater than 0. Default is 2 (second order differential matrix). Typical values are 2 or 1.

max_iterint, optional

The max number of fit iterations. Default is 50.

tolfloat, optional

The exit criteria. Default is 1e-3.

weightsarray-like, shape (M, N), optional

The weighting array. If None (default), then the initial weights will be an array with shape equal to (M, N) and all values set to 1.

num_eigensint or Sequence[int, int] or None

The number of eigenvalues for the rows and columns, respectively, to use for eigendecomposition. Typical values are between 5 and 30, with higher values needed for baselines with more curvature. If None, will solve the linear system using the full analytical solution, which is typically much slower. Default is (10, 10).

return_dofbool, optional

If True and num_eigens is not None, then the effective degrees of freedom for each eigenvector will be calculated and returned in the parameter dictionary. Default is False since the calculation takes time.

Returns:
baselinenumpy.ndarray, shape (M, N)

The calculated baseline.

paramsdict

A dictionary with the following items:

  • 'weights': numpy.ndarray, shape (M, N)

    The weight array used for fitting the data.

  • 'tol_history': numpy.ndarray

    An array containing the calculated tolerance values for each iteration. The length of the array is the number of iterations completed. If the last value in the array is greater than the input tol value, then the function did not converge.

  • 'dof'numpy.ndarray, shape (num_eigens[0], num_eigens[1])

    Only if return_dof is True. The effective degrees of freedom associated with each eigenvector. Lower values signify that the eigenvector was less important for the fit.

Raises:
ValueError

Raised if p is not between 0 and 1.

References

Eilers, P. A Perfect Smoother. Analytical Chemistry, 2003, 75(14), 3631-3636.

Eilers, P., et al. Baseline correction with asymmetric least squares smoothing. Leiden University Medical Centre Report, 2005, 1(1).

Biessy, G. Revisiting Whittaker-Henderson Smoothing. https://hal.science/hal-04124043 (Preprint), 2023.

aspls(data, lam=100000.0, diff_order=2, max_iter=100, tol=0.001, weights=None, alpha=None)

Adaptive smoothness penalized least squares smoothing (asPLS).

Parameters:
dataarray-like, shape (M, N)

The y-values of the measured data. Must not contain missing data (NaN) or Inf.

lamfloat or Sequence[float, float], optional

The smoothing parameter for the rows and columns, respectively. If a single value is given, both will use the same value. Larger values will create smoother baselines. Default is 1e5.

diff_orderint or Sequence[int, int], optional

The order of the differential matrix for the rows and columns, respectively. If a single value is given, both will use the same value. Must be greater than 0. Default is 2 (second order differential matrix). Typical values are 2 or 1.

max_iterint, optional

The max number of fit iterations. Default is 50.

tolfloat, optional

The exit criteria. Default is 1e-3.

weightsarray-like, shape (M, N), optional

The weighting array. If None (default), then the initial weights will be an array with shape equal to (M, N) and all values set to 1.

alphaarray-like, shape (M, N), optional

An array of values that control the local value of lam to better fit peak and non-peak regions. If None (default), then the initial values will be an array with shape equal to (M, N) and all values set to 1.

Returns:
baselinenumpy.ndarray, shape (M, N)

The calculated baseline.

paramsdict

A dictionary with the following items:

  • 'weights': numpy.ndarray, shape (M, N)

    The weight array used for fitting the data.

  • 'alpha': numpy.ndarray, shape (M, N)

    The array of alpha values used for fitting the data in the final iteration.

  • 'tol_history': numpy.ndarray

    An array containing the calculated tolerance values for each iteration. The length of the array is the number of iterations completed. If the last value in the array is greater than the input tol value, then the function did not converge.

Notes

The weighting uses an asymmetric coefficient (k in the asPLS paper) of 0.5 instead of the 2 listed in the asPLS paper. pybaselines uses the factor of 0.5 since it matches the results in Table 2 and Figure 5 of the asPLS paper closer than the factor of 2 and fits noisy data much better.

References

Zhang, F., et al. Baseline correction for infrared spectra using adaptive smoothness parameter penalized least squares method. Spectroscopy Letters, 2020, 53(3), 222-233.

collab_pls(data, average_dataset=True, method='asls', method_kwargs=None)

Collaborative Penalized Least Squares (collab-PLS).

Averages the data or the fit weights for an entire dataset to get more optimal results. Uses any Whittaker-smoothing-based or weighted spline algorithm.

Parameters:
dataarray-like, shape (L, M, N)

An array with shape (L, M, N) where L is the number of entries in the dataset and (M, N) is the shape of each data entry.

average_datasetbool, optional

If True (default) will average the dataset before fitting to get the weighting. If False, will fit each individual entry in the dataset and then average the weights to get the weighting for the dataset.

methodstr, optional

A string indicating the Whittaker-smoothing-based or weighted spline method to use for fitting the baseline. Default is 'asls'.

method_kwargsdict, optional

A dictionary of keyword arguments to pass to the selected method function. Default is None, which will use an empty dictionary.

Returns:
baselinesnp.ndarray, shape (L, M, N)

An array of all of the baselines.

paramsdict

A dictionary with the following items:

  • 'average_weights': numpy.ndarray, shape (M, N)

    The weight array used to fit all of the baselines.

  • 'average_alpha': numpy.ndarray, shape (M, N)

    Only returned if method is 'aspls'. The alpha array used to fit all of the baselines for the aspls().

Additional items depend on the output of the selected method. Every other key will have a list of values, with each item corresponding to a fit.

Notes

If method is 'aspls', collab_pls will also calculate the alpha array for the entire dataset in the same manner as the weights.

References

Chen, L., et al. Collaborative Penalized Least Squares for Background Correction of Multiple Raman Spectra. Journal of Analytical Methods in Chemistry, 2018, 2018.

drpls(data, lam=100000.0, eta=0.5, max_iter=50, tol=0.001, weights=None, diff_order=2)

Doubly reweighted penalized least squares (drPLS) baseline.

Parameters:
dataarray-like, shape (M, N)

The y-values of the measured data. Must not contain missing data (NaN) or Inf.

lamfloat or Sequence[float, float], optional

The smoothing parameter for the rows and columns, respectively. If a single value is given, both will use the same value. Larger values will create smoother baselines. Default is 1e5.

etafloat

A term for controlling the value of lam; should be between 0 and 1. Low values will produce smoother baselines, while higher values will more aggressively fit peaks. Default is 0.5.

max_iterint, optional

The max number of fit iterations. Default is 50.

tolfloat, optional

The exit criteria. Default is 1e-3.

weightsarray-like, shape (N,), optional

The weighting array. If None (default), then the initial weights will be an array with size equal to N and all values set to 1.

diff_orderint or Sequence[int, int], optional

The order of the differential matrix for the rows and columns, respectively. If a single value is given, both will use the same value. Must be greater than 1. Default is 2 (second order differential matrix). Typical values are 2 or 3.

Returns:
baselinenumpy.ndarray, shape (M, N)

The calculated baseline.

paramsdict

A dictionary with the following items:

  • 'weights': numpy.ndarray, shape (M, N)

    The weight array used for fitting the data.

  • 'tol_history': numpy.ndarray

    An array containing the calculated tolerance values for each iteration. The length of the array is the number of iterations completed. If the last value in the array is greater than the input tol value, then the function did not converge.

Raises:
ValueError

Raised if eta is not between 0 and 1 or if diff_order is less than 2.

References

Xu, D. et al. Baseline correction method based on doubly reweighted penalized least squares, Applied Optics, 2019, 58, 3913-3920.

iarpls(data, lam=100000.0, diff_order=2, max_iter=50, tol=0.001, weights=None, num_eigens=(10, 10), return_dof=False)

Improved asymmetrically reweighted penalized least squares smoothing (IarPLS).

Parameters:
dataarray-like, shape (M, N)

The y-values of the measured data. Must not contain missing data (NaN) or Inf.

lamfloat or Sequence[float, float], optional

The smoothing parameter for the rows and columns, respectively. If a single value is given, both will use the same value. Larger values will create smoother baselines. Default is 1e5.

diff_orderint or Sequence[int, int], optional

The order of the differential matrix for the rows and columns, respectively. If a single value is given, both will use the same value. Must be greater than 0. Default is 2 (second order differential matrix). Typical values are 2 or 1.

max_iterint, optional

The max number of fit iterations. Default is 50.

tolfloat, optional

The exit criteria. Default is 1e-3.

weightsarray-like, shape (M, N), optional

The weighting array. If None (default), then the initial weights will be an array with shape equal to (M, N) and all values set to 1.

num_eigensint or Sequence[int, int] or None

The number of eigenvalues for the rows and columns, respectively, to use for eigendecomposition. Typical values are between 5 and 30, with higher values needed for baselines with more curvature. If None, will solve the linear system using the full analytical solution, which is typically much slower. Default is (10, 10).

return_dofbool, optional

If True and num_eigens is not None, then the effective degrees of freedom for each eigenvector will be calculated and returned in the parameter dictionary. Default is False since the calculation takes time.

Returns:
baselinenumpy.ndarray, shape (M, N)

The calculated baseline.

paramsdict

A dictionary with the following items:

  • 'weights': numpy.ndarray, shape (M, N)

    The weight array used for fitting the data.

  • 'tol_history': numpy.ndarray

    An array containing the calculated tolerance values for each iteration. The length of the array is the number of iterations completed. If the last value in the array is greater than the input tol value, then the function did not converge.

  • 'dof'numpy.ndarray, shape (num_eigens[0], num_eigens[1])

    Only if return_dof is True. The effective degrees of freedom associated with each eigenvector. Lower values signify that the eigenvector was less important for the fit.

References

Ye, J., et al. Baseline correction method based on improved asymmetrically reweighted penalized least squares for Raman spectrum. Applied Optics, 2020, 59, 10933-10943.

Biessy, G. Revisiting Whittaker-Henderson Smoothing. https://hal.science/hal-04124043 (Preprint), 2023.

iasls(data, lam=1000000.0, p=0.01, lam_1=0.0001, max_iter=50, tol=0.001, weights=None, diff_order=2)

Fits the baseline using the improved asymmetric least squares (IAsLS) algorithm.

The algorithm consideres both the first and second derivatives of the residual.

Parameters:
dataarray-like, shape (M, N)

The y-values of the measured data. Must not contain missing data (NaN) or Inf.

lamfloat or Sequence[float, float], optional

The smoothing parameter for the rows and columns, respectively. If a single value is given, both will use the same value. Larger values will create smoother baselines. Default is 1e6.

pfloat, optional

The penalizing weighting factor. Must be between 0 and 1. Values greater than the baseline will be given p weight, and values less than the baseline will be given p - 1 weight. Default is 1e-2.

lam_1float or Sequence[float, float], optional

The smoothing parameter for the rows and columns, respectively, of the first derivative of the residual. Default is 1e-4.

max_iterint, optional

The max number of fit iterations. Default is 50.

tolfloat, optional

The exit criteria. Default is 1e-3.

weightsarray-like, shape (M, N), optional

The weighting array. If None (default), then the initial weights will be set by fitting the data with a second order polynomial.

diff_orderint or Sequence[int, int], optional

The order of the differential matrix for the rows and columns, respectively. If a single value is given, both will use the same value. Must be greater than 1. Default is 2 (second order differential matrix). Typical values are 2 or 3.

Returns:
baselinenumpy.ndarray, shape (M, N)

The calculated baseline.

paramsdict

A dictionary with the following items:

  • 'weights': numpy.ndarray, shape (M, N)

    The weight array used for fitting the data.

  • 'tol_history': numpy.ndarray

    An array containing the calculated tolerance values for each iteration. The length of the array is the number of iterations completed. If the last value in the array is greater than the input tol value, then the function did not converge.

Raises:
ValueError

Raised if p is not between 0 and 1 or if diff_order is less than 2.

References

He, S., et al. Baseline correction for raman spectra using an improved asymmetric least squares method, Analytical Methods, 2014, 6(12), 4402-4407.

imodpoly(data, poly_order=2, tol=0.001, max_iter=250, weights=None, use_original=False, mask_initial_peaks=True, return_coef=False, num_std=1.0, max_cross=None)

The improved modofied polynomial (IModPoly) baseline algorithm.

Parameters:
dataarray-like, shape (M, N)

The y-values of the measured data.

poly_orderint or Sequence[int, int], optional

The polynomial orders for x and z. If a single value, will use that for both x and z. Default is 2.

tolfloat, optional

The exit criteria. Default is 1e-3.

max_iterint, optional

The maximum number of iterations. Default is 250.

weightsarray-like, shape (M, N), optional

The weighting array. If None (default), then will be an array with shape equal to (M, N) and all values set to 1.

use_originalbool, optional

If False (default), will compare the baseline of each iteration with the y-values of that iteration [36] when choosing minimum values. If True, will compare the baseline with the original y-values given by data [37].

mask_initial_peaksbool, optional

If True (default), will mask any data where the initial baseline fit + the standard deviation of the residual is less than measured data [38].

return_coefbool, optional

If True, will convert the polynomial coefficients for the fit baseline to a form that fits the x and z values and return them in the params dictionary. Default is False, since the conversion takes time.

num_stdfloat, optional

The number of standard deviations to include when thresholding. Default is 1. Must be greater or equal to 0.

max_cross: int, optional

The maximum degree for the cross terms. For example, if max_cross is 1, then x z**2, x**2 z, and x**2 z**2 would all be set to 0. Default is None, which does not limit the cross terms.

Returns:
baselinenumpy.ndarray, shape (M, N)

The calculated baseline.

paramsdict

A dictionary with the following items:

  • 'weights': numpy.ndarray, shape (M, N)

    The weight array used for fitting the data.

  • 'tol_history': numpy.ndarray

    An array containing the calculated tolerance values for each iteration. The length of the array is the number of iterations completed. If the last value in the array is greater than the input tol value, then the function did not converge.

  • 'coef': numpy.ndarray, shape (poly_order[0] + 1, poly_order[1] + 1)

    Only if return_coef is True. The array of polynomial parameters for the baseline, in increasing order. Can be used to create a polynomial using numpy.polynomial.polynomial.polyval2d().

Raises:
ValueError

Raised if num_std is less than 0.

Notes

Algorithm originally developed in [38].

References

[36]

Gan, F., et al. Baseline correction by improved iterative polynomial fitting with automatic threshold. Chemometrics and Intelligent Laboratory Systems, 2006, 82, 59-65.

[37]

Lieber, C., et al. Automated method for subtraction of fluorescence from biological raman spectra. Applied Spectroscopy, 2003, 57(11), 1363-1367.

[38] (1,2)

Zhao, J., et al. Automated Autofluorescence Background Subtraction Algorithm for Biomedical Raman Spectroscopy, Applied Spectroscopy, 2007, 61(11), 1225-1232.

imor(data, half_window=None, tol=0.001, max_iter=200, **window_kwargs)

An Improved Morphological based (IMor) baseline algorithm.

Parameters:
dataarray-like, shape (M, N)

The y-values of the measured data.

half_windowint or Sequence[int, int], optional

The half-window used for the rows and columns, respectively, for the morphology functions. If a single value is given, rows and columns will use the same value. Default is None, which will optimize the half-window size using optimize_window() and window_kwargs.

tolfloat, optional

The exit criteria. Default is 1e-3.

max_iterint, optional

The maximum number of iterations. Default is 200.

**window_kwargs

Values for setting the half window used for the morphology operations. Items include:

  • 'increment': int

    The step size for iterating half windows. Default is 1.

  • 'max_hits': int

    The number of consecutive half windows that must produce the same morphological opening before accepting the half window as the optimum value. Default is 1.

  • 'window_tol': float

    The tolerance value for considering two morphological openings as equivalent. Default is 1e-6.

  • 'max_half_window': int

    The maximum allowable window size. If None (default), will be set to (len(data) - 1) / 2.

  • 'min_half_window': int

    The minimum half-window size. If None (default), will be set to 1.

Returns:
baselinenumpy.ndarray, shape (M, N)

The calculated baseline.

dict

A dictionary with the following items:

  • 'half_window': np.ndarray[int, int]

    The half windows used for the morphological calculations.

  • 'tol_history': numpy.ndarray

    An array containing the calculated tolerance values for each iteration. The length of the array is the number of iterations completed. If the last value in the array is greater than the input tol value, then the function did not converge.

References

Dai, L., et al. An Automated Baseline Correction Method Based on Iterative Morphological Operations. Applied Spectroscopy, 2018, 72(5), 731-739.

individual_axes(data, axes=(0, 1), method='asls', method_kwargs=None)

Applies a one dimensional baseline correction method along each row and/or column.

Parameters:
dataarray-like, shape (M, N)

The y-values of the measured data.

axes(0, 1) or (1, 0) or 0 or 1, optional

The axes along which to apply baseline correction. The order dictates along which axis baseline correction is first applied. Default is (0, 1), which applies baseline correction along the rows first and then the columns.

methodstr, optional

A string indicating the algorithm to use for fitting the baseline of each row and/or column; can be any one dimensional algorithm in pybaselines. Default is 'asls'.

method_kwargsSequence[dict] or dict, optional

A sequence of dictionaries of keyword arguments to pass to the selected method function for each axis in axes. A single dictionary designates that the same keyword arguments will be used for each axis. Default is None, which will use an empty dictionary.

Returns:
numpy.ndarray, shape (M, N)

The calculated baseline.

paramsdict

A dictionary with the following items:

  • 'params_rows': dict[str, list]

    Only if 0 is in axes. A dictionary of the parameters for each fit along the rows. The items within the dictionary will depend on the selected method.

  • 'params_columns': dict[str, list]

    Only if 1 is in axes. A dictionary of the parameters for each fit along the columns. The items within the dictionary will depend on the selected method.

  • 'baseline_rows': numpy.ndarray, shape (M, N)

    Only if 0 is in axes. The fit baseline along the rows.

  • 'baseline_columns': numpy.ndarray, shape (M, N)

    Only if 1 is in axes. The fit baseline along the columns.

Raises:
ValueError

Raised if method_kwargs is a sequence with length greater than axes or if the values in axes are duplicates.

Notes

If using array-like inputs within method_kwargs, they must correspond to their one-dimensional counterparts. For example, weights must be one-dimensional and have a length of M or N when used for fitting the rows or columns, respectively. Correctness of this is NOT verified within this method.

irsqr(data, lam=1000.0, quantile=0.05, num_knots=25, spline_degree=3, diff_order=3, max_iter=100, tol=1e-06, weights=None, eps=None)

Iterative Reweighted Spline Quantile Regression (IRSQR).

Fits the baseline using quantile regression with penalized splines.

Parameters:
dataarray-like, shape (M, N)

The y-values of the measured data. Must not contain missing data (NaN) or Inf.

lamfloat or Sequence[float, float], optional

The smoothing parameter for the rows and columns, respectively. If a single value is given, both will use the same value. Larger values will create smoother baselines. Default is 1e3.

quantilefloat, optional

The quantile at which to fit the baseline. Default is 0.05.

num_knotsint or Sequence[int, int], optional

The number of knots for the splines along the rows and columns, respectively. If a single value is given, both will use the same value. Default is 25.

spline_degreeint or Sequence[int, int], optional

The degree of the splines along the rows and columns, respectively. If a single value is given, both will use the same value. Default is 3, which is a cubic spline.

diff_orderint or Sequence[int, int], optional

The order of the differential matrix for the rows and columns, respectively. If a single value is given, both will use the same value. Must be greater than 0. Default is 3 (third order differential matrix). Typical values are 2 or 3.

max_iterint, optional

The max number of fit iterations. Default is 100.

tolfloat, optional

The exit criteria. Default is 1e-6.

weightsarray-like, shape (M, N), optional

The weighting array. If None (default), then the initial weights will be an array with shape equal to (M, N) and all values set to 1.

epsfloat, optional

A small value added to the square of the residual to prevent dividing by 0. Default is None, which uses the square of the maximum-absolute-value of the fit each iteration multiplied by 1e-6.

Returns:
baselinenumpy.ndarray, shape (M, N)

The calculated baseline.

paramsdict

A dictionary with the following items:

  • 'weights': numpy.ndarray, shape (M, N)

    The weight array used for fitting the data.

  • 'tol_history': numpy.ndarray

    An array containing the calculated tolerance values for each iteration. The length of the array is the number of iterations completed. If the last value in the array is greater than the input tol value, then the function did not converge.

Raises:
ValueError

Raised if quantile is not between 0 and 1.

References

Han, Q., et al. Iterative Reweighted Quantile Regression Using Augmented Lagrangian Optimization for Baseline Correction. 2018 5th International Conference on Information Science and Control Engineering (ICISCE), 2018, 280-284.

mixture_model(data, lam=1000.0, p=0.01, num_knots=25, spline_degree=3, diff_order=3, max_iter=50, tol=0.001, weights=None, symmetric=False)

Considers the data as a mixture model composed of noise and peaks.

Weights are iteratively assigned by calculating the probability each value in the residual belongs to a normal distribution representing the noise.

Parameters:
dataarray-like, shape (M, N)

The y-values of the measured data. Must not contain missing data (NaN) or Inf.

lamfloat or Sequence[float, float], optional

The smoothing parameter for the rows and columns, respectively. If a single value is given, both will use the same value. Larger values will create smoother baselines. Default is 1e3.

pfloat, optional

The penalizing weighting factor. Must be between 0 and 1. Values greater than the baseline will be given p weight, and values less than the baseline will be given p - 1 weight. Used to set the initial weights before performing expectation-maximization. Default is 1e-2.

num_knotsint or Sequence[int, int], optional

The number of knots for the splines along the rows and columns, respectively. If a single value is given, both will use the same value. Default is 25.

spline_degreeint or Sequence[int, int], optional

The degree of the splines along the rows and columns, respectively. If a single value is given, both will use the same value. Default is 3, which is a cubic spline.

diff_orderint or Sequence[int, int], optional

The order of the differential matrix for the rows and columns, respectively. If a single value is given, both will use the same value. Must be greater than 0. Default is 3 (third order differential matrix). Typical values are 2 or 3.

max_iterint, optional

The max number of fit iterations. Default is 50.

tolfloat, optional

The exit criteria. Default is 1e-3.

weightsarray-like, shape (M, N), optional

The weighting array. If None (default), then the initial weights will be an array with shape equal to (M, N) and all values set to 1, and then two iterations of reweighted least-squares are performed to provide starting weights for the expectation-maximization of the mixture model.

symmetricbool, optional

If False (default), the total mixture model will be composed of one normal distribution for the noise and one uniform distribution for positive non-noise residuals. If True, an additional uniform distribution will be added to the mixture model for negative non-noise residuals. Only need to set symmetric to True when peaks are both positive and negative.

Returns:
baselinenumpy.ndarray, shape (M, N)

The calculated baseline.

paramsdict

A dictionary with the following items:

  • 'weights': numpy.ndarray, shape (M, N)

    The weight array used for fitting the data.

  • 'tol_history': numpy.ndarray

    An array containing the calculated tolerance values for each iteration. The length of the array is the number of iterations completed. If the last value in the array is greater than the input tol value, then the function did not converge.

Raises:
ValueError

Raised if p is not between 0 and 1.

References

de Rooi, J., et al. Mixture models for baseline estimation. Chemometric and Intelligent Laboratory Systems, 2012, 117, 56-60.

Ghojogh, B., et al. Fitting A Mixture Distribution to Data: Tutorial. arXiv preprint arXiv:1901.06708, 2019.

modpoly(data, poly_order=2, tol=0.001, max_iter=250, weights=None, use_original=False, mask_initial_peaks=False, return_coef=False, max_cross=None)

The modified polynomial (ModPoly) baseline algorithm.

Parameters:
dataarray-like, shape (M, N)

The y-values of the measured data.

x_dataarray-like, shape (N,), optional

The x-values of the measured data. Default is None, which will create an array from -1 to 1 with N points.

poly_orderint or Sequence[int, int], optional

The polynomial orders for x and z. If a single value, will use that for both x and z. Default is 2.

tolfloat, optional

The exit criteria. Default is 1e-3.

max_iterint, optional

The maximum number of iterations. Default is 250.

weightsarray-like, shape (M, N), optional

The weighting array. If None (default), then will be an array with shape equal to (M, N) and all values set to 1.

use_originalbool, optional

If False (default), will compare the baseline of each iteration with the y-values of that iteration [33] when choosing minimum values. If True, will compare the baseline with the original y-values given by data [34].

mask_initial_peaksbool, optional

If True, will mask any data where the initial baseline fit + the standard deviation of the residual is less than measured data [35]. Default is False.

return_coefbool, optional

If True, will convert the polynomial coefficients for the fit baseline to a form that fits the x and z values and return them in the params dictionary. Default is False, since the conversion takes time.

max_cross: int, optional

The maximum degree for the cross terms. For example, if max_cross is 1, then x z**2, x**2 z, and x**2 z**2 would all be set to 0. Default is None, which does not limit the cross terms.

Returns:
baselinenumpy.ndarray, shape (M, N)

The calculated baseline.

paramsdict

A dictionary with the following items:

  • 'weights': numpy.ndarray, shape (M, N)

    The weight array used for fitting the data.

  • 'tol_history': numpy.ndarray

    An array containing the calculated tolerance values for each iteration. The length of the array is the number of iterations completed. If the last value in the array is greater than the input tol value, then the function did not converge.

  • 'coef': numpy.ndarray, shape (poly_order[0] + 1, poly_order[1] + 1)

    Only if return_coef is True. The array of polynomial parameters for the baseline, in increasing order. Can be used to create a polynomial using numpy.polynomial.polynomial.polyval2d().

Notes

Algorithm originally developed in [34] and then slightly modified in [33].

References

[33] (1,2)

Gan, F., et al. Baseline correction by improved iterative polynomial fitting with automatic threshold. Chemometrics and Intelligent Laboratory Systems, 2006, 82, 59-65.

[34] (1,2)

Lieber, C., et al. Automated method for subtraction of fluorescence from biological raman spectra. Applied Spectroscopy, 2003, 57(11), 1363-1367.

[35]

Zhao, J., et al. Automated Autofluorescence Background Subtraction Algorithm for Biomedical Raman Spectroscopy, Applied Spectroscopy, 2007, 61(11), 1225-1232.

mor(data, half_window=None, **window_kwargs)

A Morphological based (Mor) baseline algorithm.

Parameters:
dataarray-like, shape (M, N)

The y-values of the measured data.

half_windowint or Sequence[int, int], optional

The half-window used for the rows and columns, respectively, for the morphology functions. If a single value is given, rows and columns will use the same value. Default is None, which will optimize the half-window size using optimize_window() and window_kwargs.

**window_kwargs

Values for setting the half window used for the morphology operations. Items include:

  • 'increment': int

    The step size for iterating half windows. Default is 1.

  • 'max_hits': int

    The number of consecutive half windows that must produce the same morphological opening before accepting the half window as the optimum value. Default is 1.

  • 'window_tol': float

    The tolerance value for considering two morphological openings as equivalent. Default is 1e-6.

  • 'max_half_window': int

    The maximum allowable window size. If None (default), will be set to (len(data) - 1) / 2.

  • 'min_half_window': int

    The minimum half-window size. If None (default), will be set to 1.

Returns:
baselinenumpy.ndarray, shape (M, N)

The calculated baseline.

dict

A dictionary with the following items:

  • 'half_window': np.ndarray[int, int]

    The half windows used for the morphological calculations.

References

Perez-Pueyo, R., et al. Morphology-Based Automated Baseline Removal for Raman Spectra of Artistic Pigments. Applied Spectroscopy, 2010, 64, 595-600.

noise_median(data, half_window=None, smooth_half_window=None, sigma=None, **pad_kwargs)

The noise-median method for baseline identification.

Assumes the baseline can be considered as the median value within a moving window, and the resulting baseline is then smoothed with a Gaussian kernel.

Parameters:
dataarray-like, shape (M, N)

The y-values of the measured data.

half_windowint or Sequence[int, int], optional

The index-based size to use for the median window on the rows and columns, respectively. The total window size in each dimension will range from [-half_window, ..., half_window] with size 2 * half_window + 1. Default is None, which will use twice the output from optimize_window(), which is an okay starting value.

smooth_half_windowint, optional

The half window to use for smoothing. Default is None, which will use the average of the values in half_window.

sigmafloat, optional

The standard deviation of the smoothing Gaussian kernel. Default is None, which will use (2 * smooth_half_window + 1) / 6.

**pad_kwargs

Additional keyword arguments to pass to pad_edges2d() for padding the edges of the data to prevent edge effects from convolution.

Returns:
baselinenumpy.ndarray, shape (M, N)

The calculated and smoothed baseline.

dict

An empty dictionary, just to match the output of all other algorithms.

References

Friedrichs, M., A model-free algorithm for the removal of baseline artifacts. J. Biomolecular NMR, 1995, 5, 147-153.

penalized_poly(data, poly_order=2, tol=0.001, max_iter=250, weights=None, cost_function='asymmetric_truncated_quadratic', threshold=None, alpha_factor=0.99, return_coef=False, max_cross=None)

Fits a polynomial baseline using a non-quadratic cost function.

The non-quadratic cost functions penalize residuals with larger values, giving a more robust fit compared to normal least-squares.

Parameters:
dataarray-like, shape (M, N)

The y-values of the measured data.

poly_orderint or Sequence[int, int], optional

The polynomial orders for x and z. If a single value, will use that for both x and z. Default is 2.

tolfloat, optional

The exit criteria. Default is 1e-3.

max_iterint, optional

The maximum number of iterations. Default is 250.

weightsarray-like, shape (M, N), optional

The weighting array. If None (default), then will be an array with shape equal to (M, N) and all values set to 1.

cost_functionstr, optional

The non-quadratic cost function to minimize. Must indicate symmetry of the method by appending 'a' or 'asymmetric' for asymmetric loss, and 's' or 'symmetric' for symmetric loss. Default is 'asymmetric_truncated_quadratic'. Available methods, and their associated reference, are:

  • 'asymmetric_truncated_quadratic'[39]

  • 'symmetric_truncated_quadratic'[39]

  • 'asymmetric_huber'[39]

  • 'symmetric_huber'[39]

  • 'asymmetric_indec'[40]

  • 'symmetric_indec'[40]

thresholdfloat, optional

The threshold value for the loss method, where the function goes from quadratic loss (such as used for least squares) to non-quadratic. For symmetric loss methods, residual values with absolute value less than threshold will have quadratic loss. For asymmetric loss methods, residual values less than the threshold will have quadratic loss. Default is None, which sets threshold to one-tenth of the standard deviation of the input data.

alpha_factorfloat, optional

A value between 0 and 1 that controls the value of the penalty. Default is 0.99. Typically should not need to change this value.

return_coefbool, optional

If True, will convert the polynomial coefficients for the fit baseline to a form that fits the x and z values and return them in the params dictionary. Default is False, since the conversion takes time.

max_cross: int, optional

The maximum degree for the cross terms. For example, if max_cross is 1, then x z**2, x**2 z, and x**2 z**2 would all be set to 0. Default is None, which does not limit the cross terms.

Returns:
baselinenumpy.ndarray, shape (M, N)

The calculated baseline.

paramsdict

A dictionary with the following items:

  • 'weights': numpy.ndarray, shape (M, N)

    The weight array used for fitting the data.

  • 'tol_history': numpy.ndarray

    An array containing the calculated tolerance values for each iteration. The length of the array is the number of iterations completed. If the last value in the array is greater than the input tol value, then the function did not converge.

  • 'coef': numpy.ndarray, shape (poly_order[0] + 1, poly_order[1] + 1)

    Only if return_coef is True. The array of polynomial parameters for the baseline, in increasing order. Can be used to create a polynomial using numpy.polynomial.polynomial.polyval2d().

Raises:
ValueError

Raised if alpha_factor is not between 0 and 1.

Notes

In baseline literature, this procedure is sometimes called "backcor".

References

[39] (1,2,3,4)

Mazet, V., et al. Background removal from spectra by designing and minimising a non-quadratic cost function. Chemometrics and Intelligent Laboratory Systems, 2005, 76(2), 121-133.

[40] (1,2)

Liu, J., et al. Goldindec: A Novel Algorithm for Raman Spectrum Baseline Correction. Applied Spectroscopy, 2015, 69(7), 834-842.

poly(data, poly_order=2, weights=None, return_coef=False, max_cross=None)

Computes a polynomial that fits the baseline of the data.

Parameters:
dataarray-like, shape (M, N)

The y-values of the measured data.

poly_orderint or Sequence[int, int], optional

The polynomial orders for x and z. If a single value, will use that for both x and z. Default is 2.

weightsarray-like, shape (M, N), optional

The weighting array. If None (default), then will be an array with shape equal to (M, N) and all values set to 1.

return_coefbool, optional

If True, will convert the polynomial coefficients for the fit baseline to a form that fits the x and z values and return them in the params dictionary. Default is False, since the conversion takes time.

max_cross: int, optional

The maximum degree for the cross terms. For example, if max_cross is 1, then x z**2, x**2 z, and x**2 z**2 would all be set to 0. Default is None, which does not limit the cross terms.

Returns:
baselinenumpy.ndarray, shape (M, N)

The calculated baseline.

paramsdict

A dictionary with the following items:

  • 'weights': numpy.ndarray, shape (M, N)

    The weight array used for fitting the data.

  • 'coef': numpy.ndarray, shape (poly_order[0] + 1, poly_order[1] + 1)

    Only if return_coef is True. The array of polynomial parameters for the baseline, in increasing order. Can be used to create a polynomial using numpy.polynomial.polynomial.polyval2d().

Notes

To only fit regions without peaks, supply a weight array with zero values at the indices where peaks are located.

psalsa(data, lam=100000.0, p=0.5, k=None, diff_order=2, max_iter=50, tol=0.001, weights=None, num_eigens=(10, 10), return_dof=False)

Peaked Signal's Asymmetric Least Squares Algorithm (psalsa).

Similar to the asymmetric least squares (AsLS) algorithm, but applies an exponential decay weighting to values greater than the baseline to allow using a higher p value to better fit noisy data.

Parameters:
dataarray-like, shape (M, N)

The y-values of the measured data. Must not contain missing data (NaN) or Inf.

lamfloat or Sequence[float, float], optional

The smoothing parameter for the rows and columns, respectively. If a single value is given, both will use the same value. Larger values will create smoother baselines. Default is 1e5.

pfloat, optional

The penalizing weighting factor. Must be between 0 and 1. Values greater than the baseline will be given p weight, and values less than the baseline will be given p - 1 weight. Default is 0.5.

kfloat, optional

A factor that controls the exponential decay of the weights for baseline values greater than the data. Should be approximately the height at which a value could be considered a peak. Default is None, which sets k to one-tenth of the standard deviation of the input data. A large k value will produce similar results to asls().

diff_orderint or Sequence[int, int], optional

The order of the differential matrix for the rows and columns, respectively. If a single value is given, both will use the same value. Must be greater than 0. Default is 2 (second order differential matrix). Typical values are 2 or 1.

max_iterint, optional

The max number of fit iterations. Default is 50.

tolfloat, optional

The exit criteria. Default is 1e-3.

weightsarray-like, shape (M, N), optional

The weighting array. If None (default), then the initial weights will be an array with shape equal to (M, N) and all values set to 1.

num_eigensint or Sequence[int, int] or None

The number of eigenvalues for the rows and columns, respectively, to use for eigendecomposition. Typical values are between 5 and 30, with higher values needed for baselines with more curvature. If None, will solve the linear system using the full analytical solution, which is typically much slower. Default is (10, 10).

return_dofbool, optional

If True and num_eigens is not None, then the effective degrees of freedom for each eigenvector will be calculated and returned in the parameter dictionary. Default is False since the calculation takes time.

Returns:
baselinenumpy.ndarray, shape (M, N)

The calculated baseline.

paramsdict

A dictionary with the following items:

  • 'weights': numpy.ndarray, shape (M, N)

    The weight array used for fitting the data.

  • 'tol_history': numpy.ndarray

    An array containing the calculated tolerance values for each iteration. The length of the array is the number of iterations completed. If the last value in the array is greater than the input tol value, then the function did not converge.

  • 'dof'numpy.ndarray, shape (num_eigens[0], num_eigens[1])

    Only if return_dof is True. The effective degrees of freedom associated with each eigenvector. Lower values signify that the eigenvector was less important for the fit.

Raises:
ValueError

Raised if p is not between 0 and 1.

Notes

The exit criteria for the original algorithm was to check whether the signs of the residuals do not change between two iterations, but the comparison of the l2 norms of the weight arrays between iterations is used instead to be more comparable to other Whittaker-smoothing-based algorithms.

References

Oller-Moreno, S., et al. Adaptive Asymmetric Least Squares baseline estimation for analytical instruments. 2014 IEEE 11th International Multi-Conference on Systems, Signals, and Devices, 2014, 1-5.

Biessy, G. Revisiting Whittaker-Henderson Smoothing. https://hal.science/hal-04124043 (Preprint), 2023.

pspline_airpls(data, lam=1000.0, num_knots=25, spline_degree=3, diff_order=2, max_iter=50, tol=0.001, weights=None)

A penalized spline version of the airPLS algorithm.

Parameters:
dataarray-like, shape (N,)

The y-values of the measured data, with N data points. Must not contain missing data (NaN) or Inf.

lamfloat, optional

The smoothing parameter. Larger values will create smoother baselines. Default is 1e3.

num_knotsint, optional

The number of knots for the spline. Default is 25.

spline_degreeint, optional

The degree of the spline. Default is 3, which is a cubic spline.

diff_orderint, optional

The order of the differential matrix. Must be greater than 0. Default is 2 (second order differential matrix). Typical values are 2 or 1.

max_iterint, optional

The max number of fit iterations. Default is 50.

tolfloat, optional

The exit criteria. Default is 1e-3.

weightsarray-like, shape (N,), optional

The weighting array. If None (default), then the initial weights will be an array with size equal to N and all values set to 1.

Returns:
baselinenumpy.ndarray, shape (N,)

The calculated baseline.

paramsdict

A dictionary with the following items:

  • 'weights': numpy.ndarray, shape (N,)

    The weight array used for fitting the data.

  • 'tol_history': numpy.ndarray

    An array containing the calculated tolerance values for each iteration. The length of the array is the number of iterations completed. If the last value in the array is greater than the input tol value, then the function did not converge.

References

Zhang, Z.M., et al. Baseline correction using adaptive iteratively reweighted penalized least squares. Analyst, 2010, 135(5), 1138-1146.

Eilers, P., et al. Splines, knots, and penalties. Wiley Interdisciplinary Reviews: Computational Statistics, 2010, 2(6), 637-653.

pspline_arpls(data, lam=1000.0, num_knots=25, spline_degree=3, diff_order=2, max_iter=50, tol=0.001, weights=None)

A penalized spline version of the arPLS algorithm.

Parameters:
dataarray-like, shape (N,)

The y-values of the measured data, with N data points. Must not contain missing data (NaN) or Inf.

lamfloat, optional

The smoothing parameter. Larger values will create smoother baselines. Default is 1e3.

num_knotsint, optional

The number of knots for the spline. Default is 25.

spline_degreeint, optional

The degree of the spline. Default is 3, which is a cubic spline.

diff_orderint, optional

The order of the differential matrix. Must be greater than 0. Default is 2 (second order differential matrix). Typical values are 2 or 1.

max_iterint, optional

The max number of fit iterations. Default is 50.

tolfloat, optional

The exit criteria. Default is 1e-3.

weightsarray-like, shape (N,), optional

The weighting array. If None (default), then the initial weights will be an array with size equal to N and all values set to 1.

Returns:
baselinenumpy.ndarray, shape (N,)

The calculated baseline.

paramsdict

A dictionary with the following items:

  • 'weights': numpy.ndarray, shape (N,)

    The weight array used for fitting the data.

  • 'tol_history': numpy.ndarray

    An array containing the calculated tolerance values for each iteration. The length of the array is the number of iterations completed. If the last value in the array is greater than the input tol value, then the function did not converge.

See also

Baseline2D.arpls

References

Baek, S.J., et al. Baseline correction using asymmetrically reweighted penalized least squares smoothing. Analyst, 2015, 140, 250-257.

Eilers, P., et al. Splines, knots, and penalties. Wiley Interdisciplinary Reviews: Computational Statistics, 2010, 2(6), 637-653.

pspline_asls(data, lam=1000.0, p=0.01, num_knots=25, spline_degree=3, diff_order=2, max_iter=50, tol=0.001, weights=None)

A penalized spline version of the asymmetric least squares (AsLS) algorithm.

Parameters:
dataarray-like, shape (M, N)

The y-values of the measured data. Must not contain missing data (NaN) or Inf.

lamfloat or Sequence[float, float], optional

The smoothing parameter for the rows and columns, respectively. If a single value is given, both will use the same value. Larger values will create smoother baselines. Default is 1e3.

pfloat, optional

The penalizing weighting factor. Must be between 0 and 1. Values greater than the baseline will be given p weight, and values less than the baseline will be given p - 1 weight. Default is 1e-2.

num_knotsint or Sequence[int, int], optional

The number of knots for the splines along the rows and columns, respectively. If a single value is given, both will use the same value. Default is 25.

spline_degreeint or Sequence[int, int], optional

The degree of the splines along the rows and columns, respectively. If a single value is given, both will use the same value. Default is 3, which is a cubic spline.

diff_orderint or Sequence[int, int], optional

The order of the differential matrix for the rows and columns, respectively. If a single value is given, both will use the same value. Must be greater than 0. Default is 2 (second order differential matrix). Typical values are 1 or 2.

max_iterint, optional

The max number of fit iterations. Default is 50.

tolfloat, optional

The exit criteria. Default is 1e-3.

weightsarray-like, shape (M, N), optional

The weighting array. If None (default), then the initial weights will be an array with size equal to N and all values set to 1.

Returns:
baselinenumpy.ndarray, shape (M, N)

The calculated baseline.

paramsdict

A dictionary with the following items:

  • 'weights': numpy.ndarray, shape (M, N)

    The weight array used for fitting the data.

  • 'tol_history': numpy.ndarray

    An array containing the calculated tolerance values for each iteration. The length of the array is the number of iterations completed. If the last value in the array is greater than the input tol value, then the function did not converge.

Raises:
ValueError

Raised if p is not between 0 and 1.

See also

Baseline2D.asls

References

Eilers, P. A Perfect Smoother. Analytical Chemistry, 2003, 75(14), 3631-3636.

Eilers, P., et al. Baseline correction with asymmetric least squares smoothing. Leiden University Medical Centre Report, 2005, 1(1).

Eilers, P., et al. Splines, knots, and penalties. Wiley Interdisciplinary Reviews: Computational Statistics, 2010, 2(6), 637-653.

pspline_iarpls(data, lam=1000.0, num_knots=25, spline_degree=3, diff_order=2, max_iter=50, tol=0.001, weights=None)

A penalized spline version of the IarPLS algorithm.

Parameters:
dataarray-like, shape (N,)

The y-values of the measured data, with N data points. Must not contain missing data (NaN) or Inf.

lamfloat, optional

The smoothing parameter. Larger values will create smoother baselines. Default is 1e3.

num_knotsint, optional

The number of knots for the spline. Default is 25.

spline_degreeint, optional

The degree of the spline. Default is 3, which is a cubic spline.

diff_orderint, optional

The order of the differential matrix. Must be greater than 0. Default is 2 (second order differential matrix). Typical values are 2 or 1.

max_iterint, optional

The max number of fit iterations. Default is 50.

tolfloat, optional

The exit criteria. Default is 1e-3.

weightsarray-like, shape (N,), optional

The weighting array. If None (default), then the initial weights will be an array with size equal to N and all values set to 1.

x_dataarray-like, shape (N,), optional

The x-values of the measured data. Default is None, which will create an array from -1 to 1 with N points.

Returns:
baselinenumpy.ndarray, shape (N,)

The calculated baseline.

paramsdict

A dictionary with the following items:

  • 'weights': numpy.ndarray, shape (N,)

    The weight array used for fitting the data.

  • 'tol_history': numpy.ndarray

    An array containing the calculated tolerance values for each iteration. The length of the array is the number of iterations completed. If the last value in the array is greater than the input tol value, then the function did not converge.

References

Ye, J., et al. Baseline correction method based on improved asymmetrically reweighted penalized least squares for Raman spectrum. Applied Optics, 2020, 59, 10933-10943.

Eilers, P., et al. Splines, knots, and penalties. Wiley Interdisciplinary Reviews: Computational Statistics, 2010, 2(6), 637-653.

pspline_iasls(data, lam=1000.0, p=0.01, lam_1=0.0001, num_knots=25, spline_degree=3, max_iter=50, tol=0.001, weights=None, diff_order=2)

A penalized spline version of the IAsLS algorithm.

Parameters:
dataarray-like, shape (N,)

The y-values of the measured data, with N data points. Must not contain missing data (NaN) or Inf.

lamfloat, optional

The smoothing parameter. Larger values will create smoother baselines. Default is 1e1.

pfloat, optional

The penalizing weighting factor. Must be between 0 and 1. Values greater than the baseline will be given p weight, and values less than the baseline will be given p - 1 weight. Default is 1e-2.

lam_1float, optional

The smoothing parameter for the first derivative of the residual. Default is 1e-4.

num_knotsint, optional

The number of knots for the spline. Default is 100.

spline_degreeint, optional

The degree of the spline. Default is 3, which is a cubic spline.

max_iterint, optional

The max number of fit iterations. Default is 50.

tolfloat, optional

The exit criteria. Default is 1e-3.

weightsarray-like, shape (N,), optional

The weighting array. If None (default), then the initial weights will be an array with size equal to N and all values set to 1.

diff_orderint, optional

The order of the differential matrix. Must be greater than 1. Default is 2 (second order differential matrix). Typical values are 2 or 3.

Returns:
baselinenumpy.ndarray, shape (N,)

The calculated baseline.

paramsdict

A dictionary with the following items:

  • 'weights': numpy.ndarray, shape (N,)

    The weight array used for fitting the data.

  • 'tol_history': numpy.ndarray

    An array containing the calculated tolerance values for each iteration. The length of the array is the number of iterations completed. If the last value in the array is greater than the input tol value, then the function did not converge.

Raises:
ValueError

Raised if p is not between 0 and 1 or if diff_order is less than 2.

See also

Baseline2D.iasls

References

He, S., et al. Baseline correction for raman spectra using an improved asymmetric least squares method, Analytical Methods, 2014, 6(12), 4402-4407.

Eilers, P., et al. Splines, knots, and penalties. Wiley Interdisciplinary Reviews: Computational Statistics, 2010, 2(6), 637-653.

pspline_psalsa(data, lam=1000.0, p=0.5, k=None, num_knots=25, spline_degree=3, diff_order=2, max_iter=50, tol=0.001, weights=None)

A penalized spline version of the psalsa algorithm.

Parameters:
dataarray-like, shape (N,)

The y-values of the measured data, with N data points. Must not contain missing data (NaN) or Inf.

lamfloat, optional

The smoothing parameter. Larger values will create smoother baselines. Default is 1e3.

pfloat, optional

The penalizing weighting factor. Must be between 0 and 1. Values greater than the baseline will be given p weight, and values less than the baseline will be given p - 1 weight. Default is 0.5.

kfloat, optional

A factor that controls the exponential decay of the weights for baseline values greater than the data. Should be approximately the height at which a value could be considered a peak. Default is None, which sets k to one-tenth of the standard deviation of the input data. A large k value will produce similar results to asls().

num_knotsint, optional

The number of knots for the spline. Default is 25.

spline_degreeint, optional

The degree of the spline. Default is 3, which is a cubic spline.

diff_orderint, optional

The order of the differential matrix. Must be greater than 0. Default is 2 (second order differential matrix). Typical values are 2 or 1.

max_iterint, optional

The max number of fit iterations. Default is 50.

tolfloat, optional

The exit criteria. Default is 1e-3.

weightsarray-like, shape (N,), optional

The weighting array. If None (default), then the initial weights will be an array with size equal to N and all values set to 1.

Returns:
baselinenumpy.ndarray, shape (N,)

The calculated baseline.

paramsdict

A dictionary with the following items:

  • 'weights': numpy.ndarray, shape (N,)

    The weight array used for fitting the data.

  • 'tol_history': numpy.ndarray

    An array containing the calculated tolerance values for each iteration. The length of the array is the number of iterations completed. If the last value in the array is greater than the input tol value, then the function did not converge.

Raises:
ValueError

Raised if p is not between 0 and 1.

References

Oller-Moreno, S., et al. Adaptive Asymmetric Least Squares baseline estimation for analytical instruments. 2014 IEEE 11th International Multi-Conference on Systems, Signals, and Devices, 2014, 1-5.

Eilers, P., et al. Splines, knots, and penalties. Wiley Interdisciplinary Reviews: Computational Statistics, 2010, 2(6), 637-653.

quant_reg(data, poly_order=2, quantile=0.05, tol=1e-06, max_iter=250, weights=None, eps=None, return_coef=False, max_cross=None)

Approximates the baseline of the data using quantile regression.

Parameters:
dataarray-like, shape (M, N)

The y-values of the measured data.

poly_orderint or Sequence[int, int], optional

The polynomial orders for x and z. If a single value, will use that for both x and z. Default is 2.

quantilefloat, optional

The quantile at which to fit the baseline. Default is 0.05.

tolfloat, optional

The exit criteria. Default is 1e-6. For extreme quantiles (quantile < 0.01 or quantile > 0.99), may need to use a lower value to get a good fit.

max_iterint, optional

The maximum number of iterations. Default is 250. For extreme quantiles (quantile < 0.01 or quantile > 0.99), may need to use a higher value to ensure convergence.

weightsarray-like, shape (M, N), optional

The weighting array. If None (default), then will be an array with shape equal to (M, N) and all values set to 1.

epsfloat, optional

A small value added to the square of the residual to prevent dividing by 0. Default is None, which uses the square of the maximum-absolute-value of the fit each iteration multiplied by 1e-6.

return_coefbool, optional

If True, will convert the polynomial coefficients for the fit baseline to a form that fits the x and z values and return them in the params dictionary. Default is False, since the conversion takes time.

max_cross: int, optional

The maximum degree for the cross terms. For example, if max_cross is 1, then x z**2, x**2 z, and x**2 z**2 would all be set to 0. Default is None, which does not limit the cross terms.

Returns:
baselinenumpy.ndarray, shape (M, N)

The calculated baseline.

paramsdict

A dictionary with the following items:

  • 'weights': numpy.ndarray, shape (M, N)

    The weight array used for fitting the data.

  • 'tol_history': numpy.ndarray

    An array containing the calculated tolerance values for each iteration. The length of the array is the number of iterations completed. If the last value in the array is greater than the input tol value, then the function did not converge.

  • 'coef': numpy.ndarray, shape (poly_order[0] + 1, poly_order[1] + 1)

    Only if return_coef is True. The array of polynomial parameters for the baseline, in increasing order. Can be used to create a polynomial using numpy.polynomial.polynomial.polyval2d().

Raises:
ValueError

Raised if quantile is not between 0 and 1.

Notes

Application of quantile regression for baseline fitting ss described in [41].

Performs quantile regression using iteratively reweighted least squares (IRLS) as described in [42].

References

[41]

Komsta, Ł. Comparison of Several Methods of Chromatographic Baseline Removal with a New Approach Based on Quantile Regression. Chromatographia, 2011, 73, 721-731.

[42]

Schnabel, S., et al. Simultaneous estimation of quantile curves using quantile sheets. AStA Advances in Statistical Analysis, 2013, 97, 77-87.

rolling_ball(data, half_window=None, smooth_half_window=None, pad_kwargs=None, **window_kwargs)

The rolling ball baseline algorithm.

Applies a minimum and then maximum moving window, and subsequently smooths the result, giving a baseline that resembles rolling a ball across the data.

Parameters:
dataarray-like, shape (M, N)

The y-values of the measured data.

half_windowint or Sequence[int, int], optional

The half-window used for the rows and columns, respectively, for the morphology functions. If a single value is given, rows and columns will use the same value. Default is None, which will optimize the half-window size using optimize_window() and window_kwargs.

smooth_half_windowint, optional

The half-window to use for smoothing the data after performing the morphological operation. Default is None, which will use the same value as used for the morphological operation.

pad_kwargsdict, optional

A dictionary of keyword arguments to pass to pad_edges() for padding the edges of the data to prevent edge effects from the moving average.

**window_kwargs

Values for setting the half window used for the morphology operations. Items include:

  • 'increment': int

    The step size for iterating half windows. Default is 1.

  • 'max_hits': int

    The number of consecutive half windows that must produce the same morphological opening before accepting the half window as the optimum value. Default is 1.

  • 'window_tol': float

    The tolerance value for considering two morphological openings as equivalent. Default is 1e-6.

  • 'max_half_window': int

    The maximum allowable window size. If None (default), will be set to (len(data) - 1) / 2.

  • 'min_half_window': int

    The minimum half-window size. If None (default), will be set to 1.

Returns:
baselinenumpy.ndarray, shape (M, N)

The calculated baseline.

dict

A dictionary with the following items:

  • 'half_window': np.ndarray[int, int]

    The half windows used for the morphological calculations.

References

Kneen, M.A., et al. Algorithm for fitting XRF, SEM and PIXE X-ray spectra backgrounds. Nuclear Instruments and Methods in Physics Research B, 1996, 109, 209-213.

Liland, K., et al. Optimal Choice of Baseline Correction for Multivariate Calibration of Spectra. Applied Spectroscopy, 2010, 64(9), 1007-1016.

tophat(data, half_window=None, **window_kwargs)

Estimates the baseline using a top-hat transformation (morphological opening).

Parameters:
dataarray-like, shape (M, N)

The y-values of the measured data.

half_windowint or Sequence[int, int], optional

The half-window used for the rows and columns, respectively, for the morphology functions. If a single value is given, rows and columns will use the same value. Default is None, which will optimize the half-window size using optimize_window() and window_kwargs.

**window_kwargs

Values for setting the half window used for the morphology operations. Items include:

  • 'increment': int

    The step size for iterating half windows. Default is 1.

  • 'max_hits': int

    The number of consecutive half windows that must produce the same morphological opening before accepting the half window as the optimum value. Default is 1.

  • 'window_tol': float

    The tolerance value for considering two morphological openings as equivalent. Default is 1e-6.

  • 'max_half_window': int

    The maximum allowable window size. If None (default), will be set to (len(data) - 1) / 2.

  • 'min_half_window': int

    The minimum half-window size. If None (default), will be set to 1.

Returns:
baselinenumpy.ndarray, shape (M, N)

The calculated baseline.

dict

A dictionary with the following items:

  • 'half_window': np.ndarray[int, int]

    The half windows used for the morphological calculations.

Notes

The actual top-hat transformation is defined as data - opening(data), where opening is the morphological opening operation. This function, however, returns opening(data), since that is technically the baseline defined by the operation.

References

Perez-Pueyo, R., et al. Morphology-Based Automated Baseline Removal for Raman Spectra of Artistic Pigments. Applied Spectroscopy, 2010, 64, 595-600.