pybaselines.Baseline.penalized_poly

Baseline.penalized_poly(data, poly_order=2, tol=0.001, max_iter=250, weights=None, cost_function='asymmetric_truncated_quadratic', threshold=None, alpha_factor=0.99, return_coef=False)[source]

Fits a polynomial baseline using a non-quadratic cost function.

The non-quadratic cost functions penalize residuals with larger values, giving a more robust fit compared to normal least-squares.

Parameters:
dataarray_like, shape (N,)

The y-values of the measured data, with N data points.

poly_orderint, optional

The polynomial order for fitting the baseline. Default is 2.

tolfloat, optional

The exit criteria. Default is 1e-3.

max_iterint, optional

The maximum number of iterations. Default is 250.

weightsarray_like, shape (N,), optional

The weighting array. If None (default), then will be an array with size equal to N and all values set to 1.

cost_functionstr, optional

The non-quadratic cost function to minimize. Must indicate symmetry of the method by appending 'a' or 'asymmetric' for asymmetric loss, and 's' or 'symmetric' for symmetric loss. Default is 'asymmetric_truncated_quadratic'. Available methods, and their associated reference, are:

  • 'asymmetric_truncated_quadratic'[1]

  • 'symmetric_truncated_quadratic'[1]

  • 'asymmetric_huber'[1]

  • 'symmetric_huber'[1]

  • 'asymmetric_indec'[2]

  • 'symmetric_indec'[2]

thresholdfloat, optional

The threshold value for the loss method, where the function goes from quadratic loss (such as used for least squares) to non-quadratic. For symmetric loss methods, residual values with absolute value less than threshold will have quadratic loss. For asymmetric loss methods, residual values less than the threshold will have quadratic loss. Default is None, which sets threshold to one-tenth of the standard deviation of the input data.

alpha_factorfloat, optional

A value between 0 and 1 that controls the value of the penalty. Default is 0.99. Typically should not need to change this value.

return_coefbool, optional

If True, will convert the polynomial coefficients for the fit baseline to a form that fits the input x_data and return them in the params dictionary. Default is False, since the conversion takes time.

Returns:
baselinenumpy.ndarray, shape (N,)

The calculated baseline.

paramsdict

A dictionary with the following items:

  • 'weights': numpy.ndarray, shape (N,)

    The weight array used for fitting the data.

  • 'tol_history': numpy.ndarray

    An array containing the calculated tolerance values for each iteration. The length of the array is the number of iterations completed. If the last value in the array is greater than the input tol value, then the function did not converge.

  • 'coef': numpy.ndarray, shape (poly_order + 1,)

    Only if return_coef is True. The array of polynomial parameters for the baseline, in increasing order. Can be used to create a polynomial using numpy.polynomial.polynomial.Polynomial.

Raises:
ValueError

Raised if alpha_factor is not between 0 and 1.

Notes

In baseline literature, this procedure is sometimes called "backcor".

References

[1] (1,2,3,4)

Mazet, V., et al. Background removal from spectra by designing and minimising a non-quadratic cost function. Chemometrics and Intelligent Laboratory Systems, 2005, 76(2), 121-133.

[2] (1,2)

Liu, J., et al. Goldindec: A Novel Algorithm for Raman Spectrum Baseline Correction. Applied Spectroscopy, 2015, 69(7), 834-842.