pybaselines.optimizers

Module Contents

Functions

adaptive_minmax

Fits polynomials of different orders and uses the maximum values as the baseline.

collab_pls

Collaborative Penalized Least Squares (collab-PLS).

custom_bc

Customized baseline correction for fine tuned stiffness of the baseline at specific regions.

optimize_extended_range

Extends data and finds the best parameter value for the given baseline method.

pybaselines.optimizers.adaptive_minmax(data, x_data=None, poly_order=None, method='modpoly', weights=None, constrained_fraction=0.01, constrained_weight=100000.0, estimation_poly_order=2, method_kwargs=None)[source]

Fits polynomials of different orders and uses the maximum values as the baseline.

Each polynomial order fit is done both unconstrained and constrained at the endpoints.

Parameters:
dataarray-like, shape (N,)

The y-values of the measured data, with N data points.

x_dataarray-like, shape (N,), optional

The x-values of the measured data. Default is None, which will create an array from -1 to 1 with N points.

poly_orderint or Sequence(int, int) or None, optional

The two polynomial orders to use for fitting. If a single integer is given, then will use the input value and one plus the input value. Default is None, which will do a preliminary fit using a polynomial of order estimation_poly_order and then select the appropriate polynomial orders according to [3].

method{'modpoly', 'imodpoly'}, optional

The method to use for fitting each polynomial. Default is 'modpoly'.

weightsarray-like, shape (N,), optional

The weighting array. If None (default), then will be an array with size equal to N and all values set to 1.

constrained_fractionfloat or Sequence(float, float), optional

The fraction of points at the left and right edges to use for the constrained fit. Default is 0.01. If constrained_fraction is a sequence, the first item is the fraction for the left edge and the second is the fraction for the right edge.

constrained_weightfloat or Sequence(float, float), optional

The weighting to give to the endpoints. Higher values ensure that the end points are fit, but can cause large fluctuations in the other sections of the polynomial. Default is 1e5. If constrained_weight is a sequence, the first item is the weight for the left edge and the second is the weight for the right edge.

estimation_poly_orderint, optional

The polynomial order used for estimating the baseline-to-signal ratio to select the appropriate polynomial orders if poly_order is None. Default is 2.

method_kwargsdict, optional

Additional keyword arguments to pass to modpoly() or imodpoly(). These include tol, max_iter, use_original, mask_initial_peaks, and num_std.

Returns:
numpy.ndarray, shape (N,)

The calculated baseline.

paramsdict

A dictionary with the following items:

  • 'weights': numpy.ndarray, shape (N,)

    The weight array used for fitting the data.

  • 'constrained_weights': numpy.ndarray, shape (N,)

    The weight array used for the endpoint-constrained fits.

  • 'poly_order': numpy.ndarray, shape (2,)

    An array of the two polynomial orders used for the fitting.

References

[3]

Cao, A., et al. A robust method for automated background subtraction of tissue fluorescence. Journal of Raman Spectroscopy, 2007, 38, 1199-1205.

pybaselines.optimizers.collab_pls(data, average_dataset=True, method='asls', method_kwargs=None, x_data=None)[source]

Collaborative Penalized Least Squares (collab-PLS).

Averages the data or the fit weights for an entire dataset to get more optimal results. Uses any Whittaker-smoothing-based or weighted spline algorithm.

Parameters:
dataarray-like, shape (M, N)

An array with shape (M, N) where M is the number of entries in the dataset and N is the number of data points in each entry.

average_datasetbool, optional

If True (default) will average the dataset before fitting to get the weighting. If False, will fit each individual entry in the dataset and then average the weights to get the weighting for the dataset.

methodstr, optional

A string indicating the Whittaker-smoothing-based or weighted spline method to use for fitting the baseline. Default is 'asls'.

method_kwargsdict, optional

A dictionary of keyword arguments to pass to the selected method function. Default is None, which will use an empty dictionary.

x_dataarray-like, shape (N,), optional

The x values for the data. Not used by most Whittaker-smoothing algorithms.

Returns:
baselinesnp.ndarray, shape (M, N)

An array of all of the baselines.

paramsdict

A dictionary with the following items:

  • 'average_weights': numpy.ndarray, shape (N,)

    The weight array used to fit all of the baselines.

  • 'average_alpha': numpy.ndarray, shape (N,)

    Only returned if method is 'aspls' or 'pspline_aspls'. The alpha array used to fit all of the baselines for the aspls() or pspline_aspls() methods.

Additional items depend on the output of the selected method. Every other key will have a list of values, with each item corresponding to a fit.

Notes

If method is 'aspls' or 'pspline_aspls', collab_pls will also calculate the alpha array for the entire dataset in the same manner as the weights.

References

Chen, L., et al. Collaborative Penalized Least Squares for Background Correction of Multiple Raman Spectra. Journal of Analytical Methods in Chemistry, 2018, 2018.

pybaselines.optimizers.custom_bc(data, x_data=None, method='asls', regions=((None, None),), sampling=1, lam=None, diff_order=2, method_kwargs=None)[source]

Customized baseline correction for fine tuned stiffness of the baseline at specific regions.

Divides the data into regions with variable number of data points and then uses other baseline algorithms to fit the truncated data. Regions with less points effectively makes the fit baseline more stiff in those regions.

Parameters:
dataarray-like, shape (N,)

The y-values of the measured data, with N data points.

methodstr, optional

A string indicating the algorithm to use for fitting the baseline; can be any non-optimizer algorithm in pybaselines. Default is 'asls'.

regionsarray-like, shape (M, 2), optional

The two dimensional array containing the start and stop indices for each region of interest. Each region is defined as data[start:stop]. Default is ((None, None),), which will use all points.

samplingint or array-like, optional

The sampling step size for each region defined in regions. If sampling is an integer, then all regions will use the same index step size; if sampling is an array-like, its length must be equal to M, the first dimension in regions. Default is 1, which will use all points.

lamfloat or None, optional

The value for smoothing the calculated interpolated baseline using Whittaker smoothing, in order to reduce the kinks between regions. Default is None, which will not smooth the baseline; a value of 0 will also not perform smoothing.

diff_orderint, optional

The difference order used for Whittaker smoothing of the calculated baseline. Default is 2.

method_kwargsdict, optional

A dictionary of keyword arguments to pass to the selected method function. Default is None, which will use an empty dictionary.

Returns:
baselinenumpy.ndarray, shape (N,)

The baseline calculated with the optimum parameter.

paramsdict
A dictionary with the following items:
  • 'x_fit': numpy.ndarray, shape (P,)

    The truncated x-values used for fitting the baseline.

  • 'y_fit': numpy.ndarray, shape (P,)

    The truncated y-values used for fitting the baseline.

Additional items depend on the output of the selected method.

Raises:
ValueError

Raised if regions is not two dimensional, if sampling is not the same length as rois.shape[0], if any values in sampling or regions is less than 1, if segments in regions overlap, or if any value in regions is greater than the length of the input data.

Notes

Uses Whittaker smoothing to smooth the transitions between regions rather than LOESS as used in [4].

Uses binning rather than direct truncation of the regions in order to get better results for noisy data.

References

[4]

Liland, K., et al. Customized baseline correction. Chemometrics and Intelligent Laboratory Systems, 2011, 109(1), 51-56.

pybaselines.optimizers.optimize_extended_range(data, x_data=None, method='asls', side='both', width_scale=0.1, height_scale=1.0, sigma_scale=1.0 / 12.0, min_value=2, max_value=8, step=1, pad_kwargs=None, method_kwargs=None)[source]

Extends data and finds the best parameter value for the given baseline method.

Adds additional data to the left and/or right of the input data, and then iterates through parameter values to find the best fit. Useful for calculating the optimum lam or poly_order value required to optimize other algorithms.

Parameters:
dataarray-like, shape (N,)

The y-values of the measured data, with N data points.

x_dataarray-like, shape (N,), optional

The x-values of the measured data. Default is None, which will create an array from -1 to 1 with N points.

methodstr, optional

A string indicating the Whittaker-smoothing-based, polynomial, or spline method to use for fitting the baseline. Default is 'asls'.

side{'both', 'left', 'right'}, optional

The side of the measured data to extend. Default is 'both'.

width_scalefloat, optional

The number of data points added to each side is width_scale * N. Default is 0.1.

height_scalefloat, optional

The height of the added Gaussian peak(s) is calculated as height_scale * max(data). Default is 1.

sigma_scalefloat, optional

The sigma value for the added Gaussian peak(s) is calculated as sigma_scale * width_scale * N. Default is 1/12, which will make the Gaussian span +- 6 sigma, making its total width about half of the added length.

min_valueint or float, optional

The minimum value for the lam or poly_order value to use with the indicated method. If using a polynomial method, min_value must be an integer. If using a Whittaker-smoothing-based method, min_value should be the exponent to raise to the power of 10 (eg. a min_value value of 2 designates a lam value of 10**2). Default is 2.

max_valueint or float, optional

The maximum value for the lam or poly_order value to use with the indicated method. If using a polynomial method, max_value must be an integer. If using a Whittaker-smoothing-based method, max_value should be the exponent to raise to the power of 10 (eg. a max_value value of 3 designates a lam value of 10**3). Default is 8.

stepint or float, optional

The step size for iterating the parameter value from min_value to max_value. If using a polynomial method, step must be an integer.

pad_kwargsdict, optional

A dictionary of options to pass to pad_edges() for padding the edges of the data when adding the extended left and/or right sections. Default is None, which will use an empty dictionary.

method_kwargsdict, optional

A dictionary of keyword arguments to pass to the selected method function. Default is None, which will use an empty dictionary.

Returns:
baselinenumpy.ndarray, shape (N,)

The baseline calculated with the optimum parameter.

method_paramsdict

A dictionary with the following items:

  • 'optimal_parameter': int or float

    The lam or poly_order value that produced the lowest root-mean-squared-error.

  • 'min_rmse': float

    The minimum root-mean-squared-error obtained when using the optimal parameter.

Additional items depend on the output of the selected method.

Raises:
ValueError

Raised if side is not 'left', 'right', or 'both'.

TypeError

Raised if using a polynomial method and min_value, max_value, or step is not an integer.

ValueError

Raised if using a Whittaker-smoothing-based method and min_value, max_value, or step is greater than 100.

Notes

Based on the extended range penalized least squares (erPLS) method from [1]. The method proposed by [1] was for optimizing lambda only for the aspls method by extending only the right side of the spectrum. The method was modified by allowing extending either side following [2], and for optimizing lambda or the polynomial degree for all of the affected algorithms in pybaselines.

References

[1] (1,2)

Zhang, F., et al. An Automatic Baseline Correction Method Based on the Penalized Least Squares Method. Sensors, 2020, 20(7), 2015.

[2]

Krishna, H., et al. Range-independent background subtraction algorithm for recovery of Raman spectra of biological tissue. Journal of Raman Spectroscopy. 2012, 43(12), 1884-1894.