pybaselines.Baseline2D.pspline_brpls

Baseline2D.pspline_brpls(data, lam=1000.0, num_knots=25, spline_degree=3, diff_order=2, max_iter=50, tol=0.001, max_iter_2=50, tol_2=0.001, weights=None)[source]

A penalized spline version of the brPLS algorithm.

Parameters:
dataarray_like, shape (M, N)

The y-values of the measured data. Must not contain missing data (NaN) or Inf.

lamfloat or sequence[float, float], optional

The smoothing parameter for the rows and columns, respectively. If a single value is given, both will use the same value. Larger values will create smoother baselines. Default is 1e3.

num_knotsint or sequence[int, int], optional

The number of knots for the splines along the rows and columns, respectively. If a single value is given, both will use the same value. Default is 25.

spline_degreeint or sequence[int, int], optional

The degree of the splines along the rows and columns, respectively. If a single value is given, both will use the same value. Default is 3, which is a cubic spline.

diff_orderint or sequence[int, int], optional

The order of the differential matrix for the rows and columns, respectively. If a single value is given, both will use the same value. Must be greater than 0. Default is 2 (second order differential matrix). Typical values are 1 or 2.

max_iterint, optional

The max number of fit iterations. Default is 50.

tolfloat, optional

The exit criteria. Default is 1e-3.

max_iter_2float, optional

The number of iterations for updating the proportion of data occupied by peaks. Default is 50.

tol_2float, optional

The exit criteria for the difference between the calculated proportion of data occupied by peaks. Default is 1e-3.

weightsarray_like, shape (M, N), optional

The weighting array. If None (default), then the initial weights will be an array with size equal to N and all values set to 1.

Returns:
baselinenumpy.ndarray, shape (M, N)

The calculated baseline.

paramsdict

A dictionary with the following items:

  • 'weights': numpy.ndarray, shape (M, N)

    The weight array used for fitting the data.

  • 'tol_history': numpy.ndarray, shape (J, K)

    An array containing the calculated tolerance values for each iteration of both threshold values and fit values. Index 0 are the tolerence values for the difference in the peak proportion, and indices >= 1 are the tolerance values for each fit. All values that were not used in fitting have values of 0. Shape J is 2 plus the number of iterations for the threshold to converge (related to max_iter_2, tol_2), and shape K is the maximum of the number of iterations for the threshold and the maximum number of iterations for all of the fits of the various threshold values (related to max_iter and tol).

See also

Baseline2D.brpls

References

Wang, Q., et al. Spectral baseline estimation using penalized least squares with weights derived from the Bayesian method. Nuclear Science and Techniques, 2022, 140, 250-257.

Eilers, P., et al. Splines, knots, and penalties. Wiley Interdisciplinary Reviews: Computational Statistics, 2010, 2(6), 637-653.