pybaselines.Baseline.dietrich

Baseline.dietrich(data, smooth_half_window=None, num_std=3.0, interp_half_window=5, poly_order=5, max_iter=50, tol=0.001, weights=None, return_coef=False, min_length=2, pad_kwargs=None, **kwargs)[source]

Dietrich's method for identifying baseline regions.

Calculates the power spectrum of the data as the squared derivative of the data. Then baseline points are identified by iteratively removing points where the mean of the power spectrum is less than num_std times the standard deviation of the power spectrum.

Parameters:
dataarray_like, shape (N,)

The y-values of the measured data, with N data points.

smooth_half_windowint, optional

The half window to use for smoothing the input data with a moving average. Default is None, which will use N / 256. Set to 0 to not smooth the data.

num_stdfloat, optional

The number of standard deviations to include when thresholding. Higher values will assign more points as baseline. Default is 3.0.

interp_half_windowint, optional

When interpolating between baseline segments, will use the average of data[i-interp_half_window:i+interp_half_window+1], where i is the index of the peak start or end, to fit the linear segment. Default is 5.

poly_orderint, optional

The polynomial order for fitting the identified baseline. Default is 5.

max_iterint, optional

The maximum number of iterations for fitting a polynomial to the identified baseline. If max_iter is 0, the returned baseline will be just the linear interpolation of the baseline segments. Default is 50.

tolfloat, optional

The exit criteria for fitting a polynomial to the identified baseline points. Default is 1e-3.

weightsarray_like, shape (N,), optional

The weighting array, used to override the function's baseline identification to designate peak points. Only elements with 0 or False values will have an effect; all non-zero values are considered baseline points. If None (default), then will be an array with size equal to N and all values set to 1.

return_coefbool, optional

If True, will convert the polynomial coefficients for the fit baseline to a form that fits the input x_data and return them in the params dictionary. Default is False, since the conversion takes time.

min_lengthint, optional

Any region of consecutive baseline points less than min_length is considered to be a false positive and all points in the region are converted to peak points. A higher min_length ensures less points are falsely assigned as baseline points. Default is 2, which only removes lone baseline points.

pad_kwargsdict, optional

A dictionary of keyword arguments to pass to pad_edges() for padding the edges of the data to prevent edge effects from smoothing. Default is None.

**kwargs

Deprecated since version 1.2.0: Passing additional keyword arguments is deprecated and will be removed in version 1.4.0. Pass keyword arguments using pad_kwargs.

Returns:
baselinenumpy.ndarray, shape (N,)

The calculated baseline.

paramsdict

A dictionary with the following items:

  • 'mask': numpy.ndarray, shape (N,)

    The boolean array designating baseline points as True and peak points as False.

  • 'coef': numpy.ndarray, shape (poly_order,)

    Only if return_coef is True and max_iter is greater than 0. The array of polynomial coefficients for the baseline, in increasing order. Can be used to create a polynomial using numpy.polynomial.polynomial.Polynomial.

  • 'tol_history': numpy.ndarray

    Only if max_iter is greater than 1. An array containing the calculated tolerance values for each iteration. The length of the array is the number of iterations completed. If the last value in the array is greater than the input tol value, then the function did not converge.

Notes

When choosing parameters, first choose a smooth_half_window that appropriately smooths the data, and then reduce num_std until no peak regions are included in the baseline. If no value of num_std works, change smooth_half_window and repeat.

If max_iter is 0, the baseline is simply a linear interpolation of the identified baseline points. Otherwise, a polynomial is iteratively fit through the baseline points, and the interpolated sections are replaced each iteration with the polynomial fit.

References

Dietrich, W., et al. Fast and Precise Automatic Baseline Correction of One- and Two-Dimensional NMR Spectra. Journal of Magnetic Resonance. 1991, 91, 1-11.