pybaselines.Baseline.mixture_model
- Baseline.mixture_model(data, lam=100000.0, p=0.01, num_knots=100, spline_degree=3, diff_order=3, max_iter=50, tol=0.001, weights=None, symmetric=False, num_bins=None)[source]
Considers the data as a mixture model composed of noise and peaks.
Weights are iteratively assigned by calculating the probability each value in the residual belongs to a normal distribution representing the noise.
- Parameters:
- dataarray_like, shape (N,)
The y-values of the measured data, with N data points. Must not contain missing data (NaN) or Inf.
- lam
float, optional The smoothing parameter. Larger values will create smoother baselines. Default is 1e5.
- p
float, optional The penalizing weighting factor. Must be between 0 and 1. Values greater than the baseline will be given p weight, and values less than the baseline will be given 1 - p weight. Used to set the initial weights before performing expectation-maximization. Default is 1e-2.
- num_knots
int, optional The number of knots for the spline. Default is 100.
- spline_degree
int, optional The degree of the spline. Default is 3, which is a cubic spline.
- diff_order
int, optional The order of the differential matrix. Must be greater than 0. Default is 3 (third order differential matrix). Typical values are 2 or 3.
- max_iter
int, optional The max number of fit iterations. Default is 50.
- tol
float, optional The exit criteria. Default is 1e-3.
- weightsarray_like, shape (N,), optional
The weighting array. If None (default), then the initial weights will be an array with size equal to N and all values set to 1, and then two iterations of reweighted least-squares are performed to provide starting weights for the expectation-maximization of the mixture model.
- symmetricbool, optional
If False (default), the total mixture model will be composed of one normal distribution for the noise and one uniform distribution for positive non-noise residuals. If True, an additional uniform distribution will be added to the mixture model for negative non-noise residuals. Only need to set symmetric to True when peaks are both positive and negative.
- num_bins
int, optional, deprecated Deprecated since version 1.1.0:
num_binsis deprecated since it is no longer necessary for performing the expectation-maximization and will be removed in pybaselines version 1.3.0.
- Returns:
- baseline
numpy.ndarray, shape (N,) The calculated baseline.
- params
dict A dictionary with the following items:
- 'weights': numpy.ndarray, shape (N,)
The weight array used for fitting the data.
- 'tol_history': numpy.ndarray
An array containing the calculated tolerance values for each iteration. The length of the array is the number of iterations completed. If the last value in the array is greater than the input tol value, then the function did not converge.
- baseline
- Raises:
ValueErrorRaised if p is not between 0 and 1.
References
de Rooi, J., et al. Mixture models for baseline estimation. Chemometric and Intelligent Laboratory Systems, 2012, 117, 56-60.
Ghojogh, B., et al. Fitting A Mixture Distribution to Data: Tutorial. arXiv preprint arXiv:1901.06708, 2019.