pybaselines._spline_utils

Module Contents

Classes

PSpline

A Penalized Spline, which penalizes the difference of the spline coefficients.

class pybaselines._spline_utils.PSpline(x, num_knots=100, spline_degree=3, check_finite=False, lam=1, diff_order=2, allow_lower=True, reverse_diags=False)[source]

A Penalized Spline, which penalizes the difference of the spline coefficients.

Penalized splines (P-Splines) are solved with the following equation (B.T @ W @ B + P) c = B.T @ W @ y where c is the spline coefficients, B is the spline basis, the weights are the diagonal of W, the penalty is P, and y is the fit data. The penalty P is usually in the form lam * D.T @ D, where lam is a penalty factor and D is the matrix version of the finite difference operator.

References

Eilers, P., et al. Twenty years of P-splines. SORT: Statistics and Operations Research Transactions, 2015, 39(2), 149-186.

Eilers, P., et al. Splines, knots, and penalties. Wiley Interdisciplinary Reviews: Computational Statistics, 2010, 2(6), 637-653.

Attributes:
basisscipy.sparse.csr.csr_matrix, shape (N, M)

The spline basis. Has a shape of (N, M), where N is the number of points in x, and M is the number of basis functions (equal to K - spline_degree - 1 or equivalently num_knots + spline_degree - 1).

coefNone or numpy.ndarray, shape (M,)

The spline coefficients. Is None if solve_pspline() has not been called at least once.

knotsnumpy.ndarray, shape (K,)

The knots for the spline. Has a shape of K, which is equal to num_knots + 2 * spline_degree.

num_knotsint

The number of internal knots (including the endpoints). The total number of knots for the spline, K, is equal to num_knots + 2 * spline_degree.

spline_degreeint

The degree of the spline (eg. a cubic spline would have a spline_degree of 3).

xnumpy.ndarray, shape (N,)

The x-values for the spline.

property tck

The knots, spline coefficients, and spline degree to reconstruct the spline.

Convenience function for easily reconstructing the last solved spline with outside modules, such as with SciPy's BSpline, to allow for other usages such as evaulating with different x-values.

Raises:
ValueError

Raised if solve_pspline has not been called yet, meaning that the spline has not yet been constructed.

add_diagonal(value)

Adds a diagonal array or float to the original penalty matrix.

Parameters:
valuefloat or numpy.ndarray

The number or array to add to the main diagonal of the penalty.

Returns:
numpy.ndarray

The penalty with the main diagonal updated.

add_penalty(penalty)

Updates self.penalty with an additional penalty and updates the bands.

Parameters:
penaltyarray-like

The additional penalty to add to self.penalty.

Returns:
numpy.ndarray

The updated self.penalty.

reset_diagonals(lam=1, diff_order=2, allow_lower=True, reverse_diags=None, allow_pentapy=True, padding=0)

Resets the diagonals of the system and all of the attributes.

Useful for reusing the penalized system for a different lam value.

Parameters:
lamfloat, optional

The penalty factor applied to the difference matrix. Larger values produce smoother results. Must be greater than 0. Default is 1.

diff_orderint, optional

The difference order of the penalty. Default is 2 (second order difference).

allow_lowerbool, optional

If True (default), will allow only using the lower bands of the penalty matrix, which allows using scipy.linalg.solveh_banded() instead of the slightly slower scipy.linalg.solve_banded().

reverse_diags{None, False, True}, optional

If True, will reverse the order of the diagonals of the squared difference matrix. If False, will never reverse the diagonals. If None (default), will only reverse the diagonals if using pentapy's solver.

allow_pentapybool, optional

If True (default), will allow using pentapy's solver if diff_order is 2 and pentapy is installed. pentapy's solver is faster than scipy's banded solvers.

paddingint, optional

The number of extra layers of zeros to add to the bottom and potentially the top if the full bands are used. Default is 0, which adds no extra layers. Negative padding is treated as equivalent to 0.

reset_penalty_diagonals(lam=1, diff_order=2, allow_lower=True, reverse_diags=False)[source]

Resets the penalty diagonals of the system and all of the attributes.

Useful for reusing the penalty diagonals without having to recalculate the spline basis.

Parameters:
lamfloat, optional

The penalty factor applied to the difference matrix. Larger values produce smoother results. Must be greater than 0. Default is 1.

diff_orderint, optional

The difference order of the penalty. Default is 2 (second order difference).

allow_lowerbool, optional

If True (default), will allow only using the lower bands of the penalty matrix, which allows using scipy.linalg.solveh_banded() instead of the slightly slower scipy.linalg.solve_banded().

reverse_diagsbool, optional

If True, will reverse the order of the diagonals of the squared difference matrix. If False (default), will never reverse the diagonals.

Notes

allow_pentapy is always set to False since the time needed to go from a lower to full banded matrix and shifting the rows removes any speedup from using pentapy's solver. It also reduces the complexity of setting up the equations.

Adds padding to the penalty diagonals to accomodate the different shapes of the spline basis and the penalty to speed up calculations when the two are added.

reverse_penalty()

Reverses the penalty and original diagonals for the system.

Raises:
ValueError

Raised if self.lower is True, since reversing the half diagonals does not make physical sense.

same_basis(num_knots=100, spline_degree=3)[source]

Sees if the current basis is equivalent to the input number of knots of spline degree.

Parameters:
num_knotsint, optional

The number of knots for the new spline. Default is 100.

spline_degreeint, optional

The degree of the new spline. Default is 3.

Returns:
bool

True if the input number of knots and spline degree are equivalent to the current spline basis of the object.

solve(lhs, rhs, overwrite_ab=False, overwrite_b=False, check_finite=False, l_and_u=None, check_output=False)

Solves the equation A @ x = rhs, given A in banded format as lhs.

Parameters:
lhsarray-like, shape (M, N)

The left-hand side of the equation, in banded format. lhs is assumed to be some slight modification of self.penalty in the same format (reversed, lower, number of bands, etc. are all the same).

rhsarray-like, shape (N,)

The right-hand side of the equation.

overwrite_abbool, optional

Whether to overwrite lhs when using scipy.linalg.solveh_banded() or scipy.linalg.solve_banded(). Default is False.

overwrite_bbool, optional

Whether to overwrite rhs when using scipy.linalg.solveh_banded() or scipy.linalg.solve_banded(). Default is False.

check_finitebool, optional

Whether to check if the inputs are finite when using scipy.linalg.solveh_banded() or scipy.linalg.solve_banded(). Default is False.

l_and_uContainer(int, int), optional

The number of lower and upper bands in lhs when using scipy.linalg.solve_banded(). Default is None, which uses (len(lhs) // 2, len(lhs) // 2).

check_outputbool, optional

If True, will check the output for non-finite values when using _pentapy_solver(). Default is False.

Returns:
outputnumpy.ndarray, shape (N,)

The solution to the linear system, x.

solve_pspline(y, weights, penalty=None, rhs_extra=None)[source]

Solves the coefficients for a weighted penalized spline.

Solves the linear equation (B.T @ W @ B + P) c = B.T @ W @ y for the spline coefficients, c, given the spline basis, B, the weights (diagonal of W), the penalty P, and y, and returns the resulting spline, B @ c. Attempts to calculate B.T @ W @ B and B.T @ W @ y as a banded system to speed up the calculation.

Parameters:
ynumpy.ndarray, shape (N,)

The y-values for fitting the spline.

weightsnumpy.ndarray, shape (N,)

The weights for each y-value.

penaltynumpy.ndarray, shape (D, N)

The finite difference penalty matrix, in LAPACK's lower banded format (see scipy.linalg.solveh_banded()) if lower_only is True or the full banded format (see scipy.linalg.solve_banded()) if lower_only is False.

rhs_extrafloat or numpy.ndarray, shape (N,), optional

If supplied, rhs_extra will be added to the right hand side (B.T @ W @ y) of the equation before solving. Default is None, which adds nothing.

Returns:
numpy.ndarray, shape (N,)

The spline, corresponding to B @ c, where c are the solved spline coefficients and B is the spline basis.