pybaselines.Baseline.golotvin

Baseline.golotvin(data, half_window=None, num_std=2.0, sections=32, smooth_half_window=None, interp_half_window=5, weights=None, min_length=2, pad_kwargs=None, **kwargs)[source]

Golotvin's method for identifying baseline regions.

Divides the data into sections and takes the minimum standard deviation of all sections as the noise standard deviation for the entire data. Then classifies any point where the rolling max minus min is less than num_std * noise standard deviation as belonging to the baseline.

Parameters:
dataarray_like, shape (N,)

The y-values of the measured data, with N data points.

half_windowint, optional

The half-window to use for the rolling maximum and rolling minimum calculations. Should be approximately equal to the full-width-at-half-maximum of the peaks or features in the data. Default is None, which will use half of the value from optimize_window(), which is not always a good value, but at least scales with the number of data points and gives a starting point for tuning the parameter.

num_stdfloat, optional

The number of standard deviations to include when thresholding. Higher values will assign more points as baseline. Default is 3.0.

sectionsint, optional

The number of sections to divide the input data into for finding the minimum standard deviation. Default is 32.

smooth_half_windowint, optional

The half window to use for smoothing the interpolated baseline with a moving average. Default is None, which will use half_window. Set to 0 to not smooth the baseline.

interp_half_windowint, optional

When interpolating between baseline segments, will use the average of data[i-interp_half_window:i+interp_half_window+1], where i is the index of the peak start or end, to fit the linear segment. Default is 5.

weightsarray_like, shape (N,), optional

The weighting array, used to override the function's baseline identification to designate peak points. Only elements with 0 or False values will have an effect; all non-zero values are considered baseline points. If None (default), then will be an array with size equal to N and all values set to 1.

min_lengthint, optional

Any region of consecutive baseline points less than min_length is considered to be a false positive and all points in the region are converted to peak points. A higher min_length ensures less points are falsely assigned as baseline points. Default is 2, which only removes lone baseline points.

pad_kwargsdict, optional

A dictionary of keyword arguments to pass to pad_edges() for padding the edges of the data to prevent edge effects from smoothing. Default is None.

**kwargs

Deprecated since version 1.2.0: Passing additional keyword arguments is deprecated and will be removed in version 1.4.0. Pass keyword arguments using pad_kwargs.

Returns:
baselinenumpy.ndarray, shape (N,)

The calculated baseline.

paramsdict

A dictionary with the following items:

  • 'mask': numpy.ndarray, shape (N,)

    The boolean array designating baseline points as True and peak points as False.

References

Golotvin, S., et al. Improved Baseline Recognition and Modeling of FT NMR Spectra. Journal of Magnetic Resonance. 2000, 146, 122-125.