pybaselines.utils.pspline_smooth

pybaselines.utils.pspline_smooth(data, x_data=None, lam=10.0, num_knots=100, spline_degree=3, diff_order=2, weights=None, check_finite=True)[source]

Smooths the input data using Penalized Spline smoothing.

The input is smoothed by solving the equation (B.T @ W @ B + lam * D.T @ D) y_smooth = B.T @ W @ y, where W is a matrix with weights on the diagonals, D is the finite difference matrix, and B is the spline basis matrix.

Parameters:
dataarray_like, shape (N,)

The y-values of the measured data, with N data points.

x_dataarray_like, shape (N,), optional

The x-values of the measured data. Default is None, which will create an array from -1 to 1 with N points.

lamfloat, optional

The smoothing parameter. Larger values will create smoother baselines. Default is 1e1.

num_knotsint, optional

The number of knots for the spline. Default is 100.

spline_degreeint, optional

The degree of the spline. Default is 3, which is a cubic spline.

diff_orderint, optional

The order of the finite difference matrix. Must be greater than or equal to 0. Default is 2 (second order differential matrix). Typical values are 2 or 1.

weightsarray_like, shape (N,), optional

The weighting array, used to override the function's baseline identification to designate peak points. Only elements with 0 or False values will have an effect; all non-zero values are considered baseline points. If None (default), then will be an array with size equal to N and all values set to 1.

check_finitebool, optional

If True, will raise an error if any values if data or weights are not finite. Default is False, which skips the check.

Returns:
y_smoothnumpy.ndarray, shape (N,)

The smoothed data.

tuple(numpy.ndarray, numpy.ndarray, int)

A tuple of the spline knots, spline coefficients, and spline degree, which can be used to reconstruct the fit spline. Useful if needing to recreate the spline with different x-values.

References

Eilers, P., et al. Splines, knots, and penalties. Wiley Interdisciplinary Reviews: Computational Statistics, 2010, 2(6), 637-653.