pybaselines.Baseline.goldindec

Baseline.goldindec(data, poly_order=2, tol=0.001, max_iter=250, weights=None, cost_function='asymmetric_indec', peak_ratio=0.5, alpha_factor=0.99, tol_2=0.001, tol_3=1e-06, max_iter_2=100, return_coef=False)[source]

Fits a polynomial baseline using a non-quadratic cost function.

The non-quadratic cost functions penalize residuals with larger values, giving a more robust fit compared to normal least-squares.

Parameters:
dataarray_like, shape (N,)

The y-values of the measured data, with N data points.

poly_orderint, optional

The polynomial order for fitting the baseline. Default is 2.

tolfloat, optional

The exit criteria for the fitting with a given threshold value. Default is 1e-3.

max_iterint, optional

The maximum number of iterations for fitting a threshold value. Default is 250.

weightsarray_like, shape (N,), optional

The weighting array. If None (default), then will be an array with size equal to N and all values set to 1.

cost_functionstr, optional

The non-quadratic cost function to minimize. Unlike penalized_poly(), this function only works with asymmetric cost functions, so the symmetry prefix ('a' or 'asymmetric') is optional (eg. 'indec' and 'a_indec' are the same). Default is 'asymmetric_indec'. Available methods, and their associated reference, are:

  • 'asymmetric_indec'[1]

  • 'asymmetric_truncated_quadratic'[2]

  • 'asymmetric_huber'[2]

peak_ratiofloat, optional

A value between 0 and 1 that designates how many points in the data belong to peaks. Values are valid within ~10% of the actual peak ratio. Default is 0.5.

alpha_factorfloat, optional

A value between 0 and 1 that controls the value of the penalty. Default is 0.99. Typically should not need to change this value.

tol_2float, optional

The exit criteria for the difference between the optimal up-down ratio (number of points above 0 in the residual compared to number of points below 0) and the up-down ratio for a given threshold value. Default is 1e-3.

tol_3float, optional

The exit criteria for the relative change in the threshold value. Default is 1e-6.

max_iter_2float, optional

The number of iterations for iterating between different threshold values. Default is 100.

return_coefbool, optional

If True, will convert the polynomial coefficients for the fit baseline to a form that fits the input x_data and return them in the params dictionary. Default is False, since the conversion takes time.

Returns:
baselinenumpy.ndarray, shape (N,)

The calculated baseline.

paramsdict

A dictionary with the following items:

  • 'weights': numpy.ndarray, shape (N,)

    The weight array used for fitting the data.

  • 'tol_history': numpy.ndarray, shape (J, K)

    An array containing the calculated tolerance values for each iteration of both threshold values and fit values. Index 0 are the tolerence values for the difference in up-down ratios, index 1 are the tolerance values for the relative change in the threshold, and indices >= 2 are the tolerance values for each fit. All values that were not used in fitting have values of 0. Shape J is 2 plus the number of iterations for the threshold to converge (related to max_iter_2, tol_2, tol_3), and shape K is the maximum of the number of iterations for the threshold and the maximum number of iterations for all of the fits of the various threshold values (related to max_iter and tol).

  • 'threshold'float

    The optimal threshold value. Could be used in penalized_poly() for fitting other similar data.

  • 'coef': numpy.ndarray, shape (poly_order + 1,)

    Only if return_coef is True. The array of polynomial parameters for the baseline, in increasing order. Can be used to create a polynomial using numpy.polynomial.polynomial.Polynomial.

Raises:
ValueError

Raised if alpha_factor or peak_ratio are not between 0 and 1, or if the specified cost function is symmetric.

References

[1]

Liu, J., et al. Goldindec: A Novel Algorithm for Raman Spectrum Baseline Correction. Applied Spectroscopy, 2015, 69(7), 834-842.

[2] (1,2)

Mazet, V., et al. Background removal from spectra by designing and minimising a non-quadratic cost function. Chemometrics and Intelligent Laboratory Systems, 2005, 76(2), 121-133.