pybaselines.Baseline2D.iarpls

Baseline2D.iarpls(data, lam=100000.0, diff_order=2, max_iter=50, tol=0.001, weights=None, num_eigens=(10, 10), return_dof=False)[source]

Improved asymmetrically reweighted penalized least squares smoothing (IarPLS).

Parameters:
dataarray_like, shape (M, N)

The y-values of the measured data. Must not contain missing data (NaN) or Inf.

lamfloat or sequence[float, float], optional

The smoothing parameter for the rows and columns, respectively. If a single value is given, both will use the same value. Larger values will create smoother baselines. Default is 1e5.

diff_orderint or sequence[int, int], optional

The order of the differential matrix for the rows and columns, respectively. If a single value is given, both will use the same value. Must be greater than 0. Default is 2 (second order differential matrix). Typical values are 2 or 1.

max_iterint, optional

The max number of fit iterations. Default is 50.

tolfloat, optional

The exit criteria. Default is 1e-3.

weightsarray_like, shape (M, N), optional

The weighting array. If None (default), then the initial weights will be an array with shape equal to (M, N) and all values set to 1.

num_eigensint or sequence[int, int] or None

The number of eigenvalues for the rows and columns, respectively, to use for eigendecomposition. Typical values are between 5 and 30, with higher values needed for baselines with more curvature. If None, will solve the linear system using the full analytical solution, which is typically much slower. Must be greater than diff_order. Default is (10, 10).

return_dofbool, optional

If True and num_eigens is not None, then the effective degrees of freedom for each eigenvector will be calculated and returned in the parameter dictionary. Default is False since the calculation takes time.

Returns:
baselinenumpy.ndarray, shape (M, N)

The calculated baseline.

paramsdict

A dictionary with the following items:

  • 'weights': numpy.ndarray, shape (M, N)

    The weight array used for fitting the data.

  • 'tol_history': numpy.ndarray

    An array containing the calculated tolerance values for each iteration. The length of the array is the number of iterations completed. If the last value in the array is greater than the input tol value, then the function did not converge.

  • 'dof'numpy.ndarray, shape (num_eigens[0], num_eigens[1])

    Only if return_dof is True. The effective degrees of freedom associated with each eigenvector. Lower values signify that the eigenvector was less important for the fit.

References

Ye, J., et al. Baseline correction method based on improved asymmetrically reweighted penalized least squares for Raman spectrum. Applied Optics, 2020, 59, 10933-10943.

Biessy, G. Revisiting Whittaker-Henderson Smoothing. https://hal.science/hal-04124043 (Preprint), 2023.