pybaselines.Baseline.optimize_extended_range

Baseline.optimize_extended_range(data, method='asls', side='both', width_scale=0.1, height_scale=1.0, sigma_scale=0.08333333333333333, min_value=2, max_value=8, step=1, pad_kwargs=None, method_kwargs=None)[source]

Extends data and finds the best parameter value for the given baseline method.

Adds additional data to the left and/or right of the input data, and then iterates through parameter values to find the best fit. Useful for calculating the optimum lam or poly_order value required to optimize other algorithms.

Parameters:
dataarray_like, shape (N,)

The y-values of the measured data, with N data points.

methodstr, optional

A string indicating the Whittaker-smoothing-based, polynomial, or spline method to use for fitting the baseline. Default is 'asls'.

side{'both', 'left', 'right'}, optional

The side of the measured data to extend. Default is 'both'.

width_scalefloat, optional

The number of data points added to each side is width_scale * N. Default is 0.1.

height_scalefloat, optional

The height of the added Gaussian peak(s) is calculated as height_scale * max(data). Default is 1.

sigma_scalefloat, optional

The sigma value for the added Gaussian peak(s) is calculated as sigma_scale * width_scale * N. Default is 1/12, which will make the Gaussian span +- 6 sigma, making its total width about half of the added length.

min_valueint or float, optional

The minimum value for the lam or poly_order value to use with the indicated method. If using a polynomial method, min_value must be an integer. If using a Whittaker-smoothing-based method, min_value should be the exponent to raise to the power of 10 (eg. a min_value value of 2 designates a lam value of 10**2). Default is 2.

max_valueint or float, optional

The maximum value for the lam or poly_order value to use with the indicated method. If using a polynomial method, max_value must be an integer. If using a Whittaker-smoothing-based method, max_value should be the exponent to raise to the power of 10 (eg. a max_value value of 3 designates a lam value of 10**3). Default is 8.

stepint or float, optional

The step size for iterating the parameter value from min_value to max_value. If using a polynomial method, step must be an integer. If using a Whittaker-smoothing-based method, step should be the exponent to raise to the power of 10 (eg. a step value of 1 designates a lam value of 10**1). Default is 1.

pad_kwargsdict, optional

A dictionary of options to pass to pad_edges() for padding the edges of the data when adding the extended left and/or right sections. Default is None, which will use an empty dictionary.

method_kwargsdict, optional

A dictionary of keyword arguments to pass to the selected method function. Default is None, which will use an empty dictionary.

Returns:
baselinenumpy.ndarray, shape (N,)

The baseline calculated with the optimum parameter.

method_paramsdict

A dictionary with the following items:

  • 'optimal_parameter': int or float

    The lam or poly_order value that produced the lowest root-mean-squared-error.

  • 'min_rmse': float

    Deprecated since version 1.2.0: The 'min_rmse' key will be removed from the method_params dictionary in pybaselines version 1.4.0 in favor of the new 'rmse' key which returns all root-mean-squared-error values.

  • 'rmse'numpy.ndarray

    The array of the calculated root-mean-squared-error for each of the fits.

  • 'method_params': dict

    A dictionary containing the output parameters for the optimal fit. Items will depend on the selected method.

Raises:
ValueError

Raised if side is not 'left', 'right', or 'both'.

TypeError

Raised if using a polynomial method and min_value, max_value, or step is not an integer.

ValueError

Raised if using a Whittaker-smoothing-based method and min_value, max_value, or step is greater than 100.

Notes

Based on the extended range penalized least squares (erPLS) method from [1]. The method proposed by [1] was for optimizing lambda only for the aspls method by extending only the right side of the spectrum. The method was modified by allowing extending either side following [2], and for optimizing lambda or the polynomial degree for all of the affected algorithms in pybaselines.

References

[1] (1,2)

Zhang, F., et al. An Automatic Baseline Correction Method Based on the Penalized Least Squares Method. Sensors, 2020, 20(7), 2015.

[2]

Krishna, H., et al. Range-independent background subtraction algorithm for recovery of Raman spectra of biological tissue. Journal of Raman Spectroscopy. 2012, 43(12), 1884-1894.