pybaselines.two_d._whittaker_utils

Module Contents

Classes

PenalizedSystem2D

An object for setting up and solving penalized least squares linear systems.

WhittakerSystem2D

Sets up and solves Whittaker smoothing using the analytical solution or eigendecomposition.

class pybaselines.two_d._whittaker_utils.PenalizedSystem2D(data_size, lam=1, diff_order=2)[source]

An object for setting up and solving penalized least squares linear systems.

Notes

If the penalty is symmetric, the sparse system could be solved much faster using CHOLMOD from SuiteSparse (https://github.com/DrTimothyAldenDavis/SuiteSparse) through the python bindings provided by scikit-sparse (https://github.com/scikit-sparse/scikit-sparse), but it is not worth implementing here since this code will rarely be used.

References

Eilers, P., et al. Fast and compact smoothing on large multidimensional grids. Computational Statistics and Data Analysis, 2006, 50(1), 61-76.

Attributes:
diff_ordernumpy.ndarray[int, int]

The difference order of the penalty.

main_diagonalnumpy.ndarray

The values along the main diagonal of the penalty matrix.

penaltyscipy.sparse.base.spmatrix

The current penalty. Originally is original_diagonals after multiplying by lam and applying padding, but can also be changed by calling add_penalty(). Reset by calling reset_diagonals().

add_diagonal(value)[source]

Adds a diagonal array to the original penalty matrix.

Parameters:
valuefloat or numpy.ndarray

The diagonal array to add to the penalty matrix.

Returns:
scipy.sparse.base.spmatrix

The penalty matrix with the main diagonal updated.

add_penalty(penalty)[source]

Updates self.penalty with an additional penalty and updates the bands.

Parameters:
penaltyarray-like

The additional penalty to add to self.penalty.

Returns:
numpy.ndarray

The updated self.penalty.

reset_diagonal()[source]

Sets the main diagonal of the penalty matrix back to its original value.

reset_diagonals(lam=1, diff_order=2)[source]

Resets the diagonals of the system and all of the attributes.

Useful for reusing the penalized system for a different lam value.

Parameters:
lamfloat or Sequence[int, int], optional

The penalty factor applied to the difference matrix for the rows and columns, respectively. If a single value is given, both will use the same value. Larger values produce smoother results. Must be greater than 0. Default is 1.

diff_orderint or Sequence[int, int], optional

The difference order of the penalty for the rows and columns, respectively. If a single value is given, both will use the same value. Default is 2 (second order difference).

solve(y, weights, penalty=None, rhs_extra=None)[source]

Solves the equation A @ x = b.

Parameters:
ynumpy.ndarray

The y-values for fitting the spline.

weightsnumpy.ndarray

The weights for each y-value. Will also be added to the diagonal of the penalty.

penaltynumpy.ndarray

The penalty to use for solving. Default is None which uses the object's penalty.

rhs_extrafloat or numpy.ndarray, optional

If supplied, rhs_extra will be added to the right hand side of the equation before solving. Default is None, which adds nothing.

Returns:
numpy.ndarray, shape (N,)

The solution to the linear system, x.

class pybaselines.two_d._whittaker_utils.WhittakerSystem2D(data_size, lam=1, diff_order=2, num_eigens=None)[source]

Sets up and solves Whittaker smoothing using the analytical solution or eigendecomposition.

References

Eilers, P., et al. Fast and compact smoothing on large multidimensional grids. Computational Statistics and Data Analysis, 2006, 50(1), 61-76.

Biessy, G. Revisiting Whittaker-Henderson Smoothing. https://hal.science/hal-04124043 (Preprint), 2023.

Attributes:
basis_rscipy.sparse.csr.csr_matrix, shape (N, P)

The spline basis for the rows. Has a shape of (N, P), where N is the number of points in x, and P is the number of basis functions (equal to K - spline_degree - 1 or equivalently num_knots[0] + spline_degree[0] - 1).

basis_cscipy.sparse.csr.csr_matrix, shape (M, Q)

The spline basis for the columns. Has a shape of (M, Q), where M is the number of points in z, and Q is the number of basis functions (equal to K - spline_degree - 1 or equivalently num_knots[1] + spline_degree[1] - 1).

coefNone or numpy.ndarray, shape (M,)

The spline coefficients. Is None if solve_pspline() has not been called at least once.

property basis

The full spline basis matrix.

This is a lazy implementation since the full basis is typically not needed for computations.

add_diagonal(value)

Adds a diagonal array to the original penalty matrix.

Parameters:
valuefloat or numpy.ndarray

The diagonal array to add to the penalty matrix.

Returns:
scipy.sparse.base.spmatrix

The penalty matrix with the main diagonal updated.

add_penalty(penalty)

Updates self.penalty with an additional penalty and updates the bands.

Parameters:
penaltyarray-like

The additional penalty to add to self.penalty.

Returns:
numpy.ndarray

The updated self.penalty.

reset_diagonal()

Sets the main diagonal of the penalty matrix back to its original value.

reset_diagonals(lam=1, diff_order=2)[source]

Resets the diagonals of the system and all of the attributes.

Useful for reusing the penalized system for a different lam value.

Parameters:
lamfloat or Sequence[int, int], optional

The penalty factor applied to the difference matrix for the rows and columns, respectively. If a single value is given, both will use the same value. Larger values produce smoother results. Must be greater than 0. Default is 1.

diff_orderint or Sequence[int, int], optional

The difference order of the penalty for the rows and columns, respectively. If a single value is given, both will use the same value. Default is 2 (second order difference).

reset_penalty(lam=1, diff_order=2)[source]

Resets the penalty of the system and all of the attributes.

Useful for reusing the penalty diagonals without having to recalculate the spline basis.

Parameters:
lamfloat or Sequence[float, float], optional

The penalty factor applied to the difference matrix. Larger values produce smoother results. Must be greater than 0. Default is 1.

diff_orderint or Sequence[int, int], optional

The difference order of the penalty. Default is 2 (second order difference).

same_basis(diff_order=2, num_eigens=None)[source]

Sees if the current basis is equivalent to the input number of eigenvalues and diff order.

Always returns False if the previous setup did not use eigendecomposition or if the input maximum number of eigenvalues is None.

Parameters:
diff_orderint or Sequence[int, int], optional

The difference order of the penalty for the rows and columns, respectively. If a single value is given, both will use the same value. Default is 2 (second order difference).

num_eigensint or Sequence[int, int] or None

The number of eigenvalues for the rows and columns, respectively, to use for the eigendecomposition. If None, will solve the linear system using the full analytical solution, which is typically much slower.

Returns:
bool

True if the input number of eigenvalues and difference order are equivalent to the current setup for the object.

solve(y, weights, penalty=None, rhs_extra=None, assume_a='pos')[source]

Solves the coefficients for a weighted penalized spline.

Solves the linear equation (B.T @ W @ B + P) c = B.T @ W @ y for the spline coefficients, c, given the spline basis, B, the weights (diagonal of W), the penalty P, and y, and returns the resulting spline, B @ c. Attempts to calculate B.T @ W @ B and B.T @ W @ y as a banded system to speed up the calculation.

Parameters:
ynumpy.ndarray, shape (M, N)

The y-values for fitting the spline.

weightsnumpy.ndarray, shape (M, N)

The weights for each y-value.

penaltynumpy.ndarray, shape (M * N, M * N)

The finite difference penalty matrix, in LAPACK's lower banded format (see scipy.linalg.solveh_banded()) if lower_only is True or the full banded format (see scipy.linalg.solve_banded()) if lower_only is False.

rhs_extrafloat or numpy.ndarray, shape (M * N,), optional

If supplied, rhs_extra will be added to the right hand side (B.T @ W @ y) of the equation before solving. Default is None, which adds nothing.

Returns:
numpy.ndarray, shape (M, N)

The spline, corresponding to B @ c, where c are the solved spline coefficients and B is the spline basis.

Notes

Uses the more efficient algorithm from Eilers's paper, although the memory usage is higher than the straigtforward method when the number of knots is high; however, it is significantly faster and memory efficient when the number of knots is lower, which will be the more typical use case.