pybaselines.Baseline.mixture_model

Baseline.mixture_model(data, lam=100000.0, p=0.01, num_knots=100, spline_degree=3, diff_order=3, max_iter=50, tol=0.001, weights=None, symmetric=False, num_bins=None)[source]

Considers the data as a mixture model composed of noise and peaks.

Weights are iteratively assigned by calculating the probability each value in the residual belongs to a normal distribution representing the noise.

Parameters:
dataarray_like, shape (N,)

The y-values of the measured data, with N data points. Must not contain missing data (NaN) or Inf.

lamfloat, optional

The smoothing parameter. Larger values will create smoother baselines. Default is 1e5.

pfloat, optional

The penalizing weighting factor. Must be between 0 and 1. Values greater than the baseline will be given p weight, and values less than the baseline will be given 1 - p weight. Used to set the initial weights before performing expectation-maximization. Default is 1e-2.

num_knotsint, optional

The number of knots for the spline. Default is 100.

spline_degreeint, optional

The degree of the spline. Default is 3, which is a cubic spline.

diff_orderint, optional

The order of the differential matrix. Must be greater than 0. Default is 3 (third order differential matrix). Typical values are 2 or 3.

max_iterint, optional

The max number of fit iterations. Default is 50.

tolfloat, optional

The exit criteria. Default is 1e-3.

weightsarray_like, shape (N,), optional

The weighting array. If None (default), then the initial weights will be an array with size equal to N and all values set to 1, and then two iterations of reweighted least-squares are performed to provide starting weights for the expectation-maximization of the mixture model.

symmetricbool, optional

If False (default), the total mixture model will be composed of one normal distribution for the noise and one uniform distribution for positive non-noise residuals. If True, an additional uniform distribution will be added to the mixture model for negative non-noise residuals. Only need to set symmetric to True when peaks are both positive and negative.

num_binsint, optional, deprecated

Deprecated since version 1.1.0: num_bins is deprecated since it is no longer necessary for performing the expectation-maximization and will be removed in pybaselines version 1.3.0.

Returns:
baselinenumpy.ndarray, shape (N,)

The calculated baseline.

paramsdict

A dictionary with the following items:

  • 'weights': numpy.ndarray, shape (N,)

    The weight array used for fitting the data.

  • 'tol_history': numpy.ndarray

    An array containing the calculated tolerance values for each iteration. The length of the array is the number of iterations completed. If the last value in the array is greater than the input tol value, then the function did not converge.

Raises:
ValueError

Raised if p is not between 0 and 1.

References

de Rooi, J., et al. Mixture models for baseline estimation. Chemometric and Intelligent Laboratory Systems, 2012, 117, 56-60.

Ghojogh, B., et al. Fitting A Mixture Distribution to Data: Tutorial. arXiv preprint arXiv:1901.06708, 2019.