pybaselines.optimizers
Module Contents
Functions
Fits polynomials of different orders and uses the maximum values as the baseline. 

Collaborative Penalized Least Squares (collabPLS). 

Customized baseline correction for fine tuned stiffness of the baseline at specific regions. 

Extends data and finds the best parameter value for the given baseline method. 
 pybaselines.optimizers.adaptive_minmax(data, x_data=None, poly_order=None, method='modpoly', weights=None, constrained_fraction=0.01, constrained_weight=100000.0, estimation_poly_order=2, method_kwargs=None)[source]
Fits polynomials of different orders and uses the maximum values as the baseline.
Each polynomial order fit is done both unconstrained and constrained at the endpoints.
 Parameters:
 dataarraylike, shape (N,)
The yvalues of the measured data, with N data points.
 x_dataarraylike, shape (N,), optional
The xvalues of the measured data. Default is None, which will create an array from 1 to 1 with N points.
 poly_orderint or Sequence(int, int) or None, optional
The two polynomial orders to use for fitting. If a single integer is given, then will use the input value and one plus the input value. Default is None, which will do a preliminary fit using a polynomial of order estimation_poly_order and then select the appropriate polynomial orders according to [3].
 method{'modpoly', 'imodpoly'}, optional
The method to use for fitting each polynomial. Default is 'modpoly'.
 weightsarraylike, shape (N,), optional
The weighting array. If None (default), then will be an array with size equal to N and all values set to 1.
 constrained_fractionfloat or Sequence(float, float), optional
The fraction of points at the left and right edges to use for the constrained fit. Default is 0.01. If constrained_fraction is a sequence, the first item is the fraction for the left edge and the second is the fraction for the right edge.
 constrained_weightfloat or Sequence(float, float), optional
The weighting to give to the endpoints. Higher values ensure that the end points are fit, but can cause large fluctuations in the other sections of the polynomial. Default is 1e5. If constrained_weight is a sequence, the first item is the weight for the left edge and the second is the weight for the right edge.
 estimation_poly_orderint, optional
The polynomial order used for estimating the baselinetosignal ratio to select the appropriate polynomial orders if poly_order is None. Default is 2.
 method_kwargsdict, optional
Additional keyword arguments to pass to
modpoly()
orimodpoly()
. These include tol, max_iter, use_original, mask_initial_peaks, and num_std.
 Returns:
 numpy.ndarray, shape (N,)
The calculated baseline.
 paramsdict
A dictionary with the following items:
 'weights': numpy.ndarray, shape (N,)
The weight array used for fitting the data.
 'constrained_weights': numpy.ndarray, shape (N,)
The weight array used for the endpointconstrained fits.
 'poly_order': numpy.ndarray, shape (2,)
An array of the two polynomial orders used for the fitting.
References
[3]Cao, A., et al. A robust method for automated background subtraction of tissue fluorescence. Journal of Raman Spectroscopy, 2007, 38, 11991205.
 pybaselines.optimizers.collab_pls(data, average_dataset=True, method='asls', method_kwargs=None, x_data=None)[source]
Collaborative Penalized Least Squares (collabPLS).
Averages the data or the fit weights for an entire dataset to get more optimal results. Uses any Whittakersmoothingbased or weighted spline algorithm.
 Parameters:
 dataarraylike, shape (M, N)
An array with shape (M, N) where M is the number of entries in the dataset and N is the number of data points in each entry.
 average_datasetbool, optional
If True (default) will average the dataset before fitting to get the weighting. If False, will fit each individual entry in the dataset and then average the weights to get the weighting for the dataset.
 methodstr, optional
A string indicating the Whittakersmoothingbased or weighted spline method to use for fitting the baseline. Default is 'asls'.
 method_kwargsdict, optional
A dictionary of keyword arguments to pass to the selected method function. Default is None, which will use an empty dictionary.
 x_dataarraylike, shape (N,), optional
The x values for the data. Not used by most Whittakersmoothing algorithms.
 Returns:
 baselinesnp.ndarray, shape (M, N)
An array of all of the baselines.
 paramsdict
A dictionary with the following items:
 'average_weights': numpy.ndarray, shape (N,)
The weight array used to fit all of the baselines.
 'average_alpha': numpy.ndarray, shape (N,)
Only returned if method is 'aspls' or 'pspline_aspls'. The alpha array used to fit all of the baselines for the
aspls()
orpspline_aspls()
methods.
Additional items depend on the output of the selected method. Every other key will have a list of values, with each item corresponding to a fit.
Notes
If method is 'aspls' or 'pspline_aspls', collab_pls will also calculate the alpha array for the entire dataset in the same manner as the weights.
References
Chen, L., et al. Collaborative Penalized Least Squares for Background Correction of Multiple Raman Spectra. Journal of Analytical Methods in Chemistry, 2018, 2018.
 pybaselines.optimizers.custom_bc(data, x_data=None, method='asls', regions=((None, None),), sampling=1, lam=None, diff_order=2, method_kwargs=None)[source]
Customized baseline correction for fine tuned stiffness of the baseline at specific regions.
Divides the data into regions with variable number of data points and then uses other baseline algorithms to fit the truncated data. Regions with less points effectively makes the fit baseline more stiff in those regions.
 Parameters:
 dataarraylike, shape (N,)
The yvalues of the measured data, with N data points.
 methodstr, optional
A string indicating the algorithm to use for fitting the baseline; can be any nonoptimizer algorithm in pybaselines. Default is 'asls'.
 regionsarraylike, shape (M, 2), optional
The two dimensional array containing the start and stop indices for each region of interest. Each region is defined as
data[start:stop]
. Default is ((None, None),), which will use all points. samplingint or arraylike, optional
The sampling step size for each region defined in regions. If sampling is an integer, then all regions will use the same index step size; if sampling is an arraylike, its length must be equal to M, the first dimension in regions. Default is 1, which will use all points.
 lamfloat or None, optional
The value for smoothing the calculated interpolated baseline using Whittaker smoothing, in order to reduce the kinks between regions. Default is None, which will not smooth the baseline; a value of 0 will also not perform smoothing.
 diff_orderint, optional
The difference order used for Whittaker smoothing of the calculated baseline. Default is 2.
 method_kwargsdict, optional
A dictionary of keyword arguments to pass to the selected method function. Default is None, which will use an empty dictionary.
 Returns:
 baselinenumpy.ndarray, shape (N,)
The baseline calculated with the optimum parameter.
 paramsdict
 A dictionary with the following items:
 'x_fit': numpy.ndarray, shape (P,)
The truncated xvalues used for fitting the baseline.
 'y_fit': numpy.ndarray, shape (P,)
The truncated yvalues used for fitting the baseline.
Additional items depend on the output of the selected method.
 Raises:
 ValueError
Raised if regions is not two dimensional, if sampling is not the same length as rois.shape[0], if any values in sampling or regions is less than 1, if segments in regions overlap, or if any value in regions is greater than the length of the input data.
Notes
Uses Whittaker smoothing to smooth the transitions between regions rather than LOESS as used in [4].
Uses binning rather than direct truncation of the regions in order to get better results for noisy data.
References
[4]Liland, K., et al. Customized baseline correction. Chemometrics and Intelligent Laboratory Systems, 2011, 109(1), 5156.
 pybaselines.optimizers.optimize_extended_range(data, x_data=None, method='asls', side='both', width_scale=0.1, height_scale=1.0, sigma_scale=1.0 / 12.0, min_value=2, max_value=8, step=1, pad_kwargs=None, method_kwargs=None)[source]
Extends data and finds the best parameter value for the given baseline method.
Adds additional data to the left and/or right of the input data, and then iterates through parameter values to find the best fit. Useful for calculating the optimum lam or poly_order value required to optimize other algorithms.
 Parameters:
 dataarraylike, shape (N,)
The yvalues of the measured data, with N data points.
 x_dataarraylike, shape (N,), optional
The xvalues of the measured data. Default is None, which will create an array from 1 to 1 with N points.
 methodstr, optional
A string indicating the Whittakersmoothingbased, polynomial, or spline method to use for fitting the baseline. Default is 'asls'.
 side{'both', 'left', 'right'}, optional
The side of the measured data to extend. Default is 'both'.
 width_scalefloat, optional
The number of data points added to each side is width_scale * N. Default is 0.1.
 height_scalefloat, optional
The height of the added Gaussian peak(s) is calculated as height_scale * max(data). Default is 1.
 sigma_scalefloat, optional
The sigma value for the added Gaussian peak(s) is calculated as sigma_scale * width_scale * N. Default is 1/12, which will make the Gaussian span + 6 sigma, making its total width about half of the added length.
 min_valueint or float, optional
The minimum value for the lam or poly_order value to use with the indicated method. If using a polynomial method, min_value must be an integer. If using a Whittakersmoothingbased method, min_value should be the exponent to raise to the power of 10 (eg. a min_value value of 2 designates a lam value of 10**2). Default is 2.
 max_valueint or float, optional
The maximum value for the lam or poly_order value to use with the indicated method. If using a polynomial method, max_value must be an integer. If using a Whittakersmoothingbased method, max_value should be the exponent to raise to the power of 10 (eg. a max_value value of 3 designates a lam value of 10**3). Default is 8.
 stepint or float, optional
The step size for iterating the parameter value from min_value to max_value. If using a polynomial method, step must be an integer.
 pad_kwargsdict, optional
A dictionary of options to pass to
pad_edges()
for padding the edges of the data when adding the extended left and/or right sections. Default is None, which will use an empty dictionary. method_kwargsdict, optional
A dictionary of keyword arguments to pass to the selected method function. Default is None, which will use an empty dictionary.
 Returns:
 baselinenumpy.ndarray, shape (N,)
The baseline calculated with the optimum parameter.
 method_paramsdict
A dictionary with the following items:
 'optimal_parameter': int or float
The lam or poly_order value that produced the lowest rootmeansquarederror.
 'min_rmse': float
The minimum rootmeansquarederror obtained when using the optimal parameter.
Additional items depend on the output of the selected method.
 Raises:
 ValueError
Raised if side is not 'left', 'right', or 'both'.
 TypeError
Raised if using a polynomial method and min_value, max_value, or step is not an integer.
 ValueError
Raised if using a Whittakersmoothingbased method and min_value, max_value, or step is greater than 100.
Notes
Based on the extended range penalized least squares (erPLS) method from [1]. The method proposed by [1] was for optimizing lambda only for the aspls method by extending only the right side of the spectrum. The method was modified by allowing extending either side following [2], and for optimizing lambda or the polynomial degree for all of the affected algorithms in pybaselines.
References
[1] (1,2)Zhang, F., et al. An Automatic Baseline Correction Method Based on the Penalized Least Squares Method. Sensors, 2020, 20(7), 2015.
[2]Krishna, H., et al. Rangeindependent background subtraction algorithm for recovery of Raman spectra of biological tissue. Journal of Raman Spectroscopy. 2012, 43(12), 18841894.