pybaselines.Baseline.goldindec
- Baseline.goldindec(data, poly_order=2, tol=0.001, max_iter=250, weights=None, cost_function='asymmetric_indec', peak_ratio=0.5, alpha_factor=0.99, tol_2=0.001, tol_3=1e-06, max_iter_2=100, return_coef=False)[source]
Fits a polynomial baseline using a non-quadratic cost function.
The non-quadratic cost functions penalize residuals with larger values, giving a more robust fit compared to normal least-squares.
- Parameters:
- dataarray_like, shape (N,)
The y-values of the measured data, with N data points.
- poly_order
int
, optional The polynomial order for fitting the baseline. Default is 2.
- tol
float
, optional The exit criteria for the fitting with a given threshold value. Default is 1e-3.
- max_iter
int
, optional The maximum number of iterations for fitting a threshold value. Default is 250.
- weightsarray_like, shape (N,), optional
The weighting array. If None (default), then will be an array with size equal to N and all values set to 1.
- cost_function
str
, optional The non-quadratic cost function to minimize. Unlike
penalized_poly()
, this function only works with asymmetric cost functions, so the symmetry prefix ('a' or 'asymmetric') is optional (eg. 'indec' and 'a_indec' are the same). Default is 'asymmetric_indec'. Available methods, and their associated reference, are:- peak_ratio
float
, optional A value between 0 and 1 that designates how many points in the data belong to peaks. Values are valid within ~10% of the actual peak ratio. Default is 0.5.
- alpha_factor
float
, optional A value between 0 and 1 that controls the value of the penalty. Default is 0.99. Typically should not need to change this value.
- tol_2
float
, optional The exit criteria for the difference between the optimal up-down ratio (number of points above 0 in the residual compared to number of points below 0) and the up-down ratio for a given threshold value. Default is 1e-3.
- tol_3
float
, optional The exit criteria for the relative change in the threshold value. Default is 1e-6.
- max_iter_2
float
, optional The number of iterations for iterating between different threshold values. Default is 100.
- return_coefbool, optional
If True, will convert the polynomial coefficients for the fit baseline to a form that fits the input x_data and return them in the params dictionary. Default is False, since the conversion takes time.
- Returns:
- baseline
numpy.ndarray
, shape (N,) The calculated baseline.
- params
dict
A dictionary with the following items:
- 'weights': numpy.ndarray, shape (N,)
The weight array used for fitting the data.
- 'tol_history': numpy.ndarray, shape (J, K)
An array containing the calculated tolerance values for each iteration of both threshold values and fit values. Index 0 are the tolerence values for the difference in up-down ratios, index 1 are the tolerance values for the relative change in the threshold, and indices >= 2 are the tolerance values for each fit. All values that were not used in fitting have values of 0. Shape J is 2 plus the number of iterations for the threshold to converge (related to max_iter_2, tol_2, tol_3), and shape K is the maximum of the number of iterations for the threshold and the maximum number of iterations for all of the fits of the various threshold values (related to max_iter and tol).
- 'threshold'float
The optimal threshold value. Could be used in
penalized_poly()
for fitting other similar data.
- 'coef': numpy.ndarray, shape (poly_order + 1,)
Only if return_coef is True. The array of polynomial parameters for the baseline, in increasing order. Can be used to create a polynomial using
numpy.polynomial.polynomial.Polynomial
.
- baseline
- Raises:
ValueError
Raised if alpha_factor or peak_ratio are not between 0 and 1, or if the specified cost function is symmetric.
References
[1]Liu, J., et al. Goldindec: A Novel Algorithm for Raman Spectrum Baseline Correction. Applied Spectroscopy, 2015, 69(7), 834-842.