pybaselines.two_d._whittaker_utils
Module Contents
Classes
An object for setting up and solving penalized least squares linear systems. 

Sets up and solves Whittaker smoothing using the analytical solution or eigendecomposition. 
 class pybaselines.two_d._whittaker_utils.PenalizedSystem2D(data_size, lam=1, diff_order=2)[source]
An object for setting up and solving penalized least squares linear systems.
Notes
If the penalty is symmetric, the sparse system could be solved much faster using CHOLMOD from SuiteSparse (https://github.com/DrTimothyAldenDavis/SuiteSparse) through the python bindings provided by scikitsparse (https://github.com/scikitsparse/scikitsparse), but it is not worth implementing here since this code will rarely be used.
References
Eilers, P., et al. Fast and compact smoothing on large multidimensional grids. Computational Statistics and Data Analysis, 2006, 50(1), 6176.
 Attributes:
 diff_ordernumpy.ndarray[int, int]
The difference order of the penalty.
 main_diagonalnumpy.ndarray
The values along the main diagonal of the penalty matrix.
 penaltyscipy.sparse.base.spmatrix
The current penalty. Originally is original_diagonals after multiplying by lam and applying padding, but can also be changed by calling
add_penalty()
. Reset by callingreset_diagonals()
.
 add_diagonal(value)[source]
Adds a diagonal array to the original penalty matrix.
 Parameters:
 valuefloat or numpy.ndarray
The diagonal array to add to the penalty matrix.
 Returns:
 scipy.sparse.base.spmatrix
The penalty matrix with the main diagonal updated.
 add_penalty(penalty)[source]
Updates self.penalty with an additional penalty and updates the bands.
 Parameters:
 penaltyarraylike
The additional penalty to add to self.penalty.
 Returns:
 numpy.ndarray
The updated self.penalty.
 reset_diagonals(lam=1, diff_order=2)[source]
Resets the diagonals of the system and all of the attributes.
Useful for reusing the penalized system for a different lam value.
 Parameters:
 lamfloat or Sequence[int, int], optional
The penalty factor applied to the difference matrix for the rows and columns, respectively. If a single value is given, both will use the same value. Larger values produce smoother results. Must be greater than 0. Default is 1.
 diff_orderint or Sequence[int, int], optional
The difference order of the penalty for the rows and columns, respectively. If a single value is given, both will use the same value. Default is 2 (second order difference).
 solve(y, weights, penalty=None, rhs_extra=None)[source]
Solves the equation
A @ x = b
. Parameters:
 ynumpy.ndarray
The yvalues for fitting the spline.
 weightsnumpy.ndarray
The weights for each yvalue. Will also be added to the diagonal of the penalty.
 penaltynumpy.ndarray
The penalty to use for solving. Default is None which uses the object's penalty.
 rhs_extrafloat or numpy.ndarray, optional
If supplied, rhs_extra will be added to the right hand side of the equation before solving. Default is None, which adds nothing.
 Returns:
 numpy.ndarray, shape (N,)
The solution to the linear system, x.
 class pybaselines.two_d._whittaker_utils.WhittakerSystem2D(data_size, lam=1, diff_order=2, num_eigens=None)[source]
Sets up and solves Whittaker smoothing using the analytical solution or eigendecomposition.
References
Eilers, P., et al. Fast and compact smoothing on large multidimensional grids. Computational Statistics and Data Analysis, 2006, 50(1), 6176.
Biessy, G. Revisiting WhittakerHenderson Smoothing. https://hal.science/hal04124043 (Preprint), 2023.
 Attributes:
 basis_rscipy.sparse.csr.csr_matrix, shape (N, P)
The spline basis for the rows. Has a shape of (N, P), where N is the number of points in x, and P is the number of basis functions (equal to
K  spline_degree  1
or equivalentlynum_knots[0] + spline_degree[0]  1
). basis_cscipy.sparse.csr.csr_matrix, shape (M, Q)
The spline basis for the columns. Has a shape of (M, Q), where M is the number of points in z, and Q is the number of basis functions (equal to
K  spline_degree  1
or equivalentlynum_knots[1] + spline_degree[1]  1
). coefNone or numpy.ndarray, shape (M,)
The spline coefficients. Is None if
solve_pspline()
has not been called at least once.
 property basis
The full spline basis matrix.
This is a lazy implementation since the full basis is typically not needed for computations.
 add_diagonal(value)
Adds a diagonal array to the original penalty matrix.
 Parameters:
 valuefloat or numpy.ndarray
The diagonal array to add to the penalty matrix.
 Returns:
 scipy.sparse.base.spmatrix
The penalty matrix with the main diagonal updated.
 add_penalty(penalty)
Updates self.penalty with an additional penalty and updates the bands.
 Parameters:
 penaltyarraylike
The additional penalty to add to self.penalty.
 Returns:
 numpy.ndarray
The updated self.penalty.
 reset_diagonal()
Sets the main diagonal of the penalty matrix back to its original value.
 reset_diagonals(lam=1, diff_order=2)[source]
Resets the diagonals of the system and all of the attributes.
Useful for reusing the penalized system for a different lam value.
 Parameters:
 lamfloat or Sequence[int, int], optional
The penalty factor applied to the difference matrix for the rows and columns, respectively. If a single value is given, both will use the same value. Larger values produce smoother results. Must be greater than 0. Default is 1.
 diff_orderint or Sequence[int, int], optional
The difference order of the penalty for the rows and columns, respectively. If a single value is given, both will use the same value. Default is 2 (second order difference).
 reset_penalty(lam=1, diff_order=2)[source]
Resets the penalty of the system and all of the attributes.
Useful for reusing the penalty diagonals without having to recalculate the spline basis.
 Parameters:
 lamfloat or Sequence[float, float], optional
The penalty factor applied to the difference matrix. Larger values produce smoother results. Must be greater than 0. Default is 1.
 diff_orderint or Sequence[int, int], optional
The difference order of the penalty. Default is 2 (second order difference).
 same_basis(diff_order=2, num_eigens=None)[source]
Sees if the current basis is equivalent to the input number of eigenvalues and diff order.
Always returns False if the previous setup did not use eigendecomposition or if the input maximum number of eigenvalues is None.
 Parameters:
 diff_orderint or Sequence[int, int], optional
The difference order of the penalty for the rows and columns, respectively. If a single value is given, both will use the same value. Default is 2 (second order difference).
 num_eigensint or Sequence[int, int] or None
The number of eigenvalues for the rows and columns, respectively, to use for the eigendecomposition. If None, will solve the linear system using the full analytical solution, which is typically much slower.
 Returns:
 bool
True if the input number of eigenvalues and difference order are equivalent to the current setup for the object.
 solve(y, weights, penalty=None, rhs_extra=None, assume_a='pos')[source]
Solves the coefficients for a weighted penalized spline.
Solves the linear equation
(B.T @ W @ B + P) c = B.T @ W @ y
for the spline coefficients, c, given the spline basis, B, the weights (diagonal of W), the penalty P, and y, and returns the resulting spline,B @ c
. Attempts to calculateB.T @ W @ B
andB.T @ W @ y
as a banded system to speed up the calculation. Parameters:
 ynumpy.ndarray, shape (M, N)
The yvalues for fitting the spline.
 weightsnumpy.ndarray, shape (M, N)
The weights for each yvalue.
 penaltynumpy.ndarray, shape (
M * N
,M * N
) The finite difference penalty matrix, in LAPACK's lower banded format (see
scipy.linalg.solveh_banded()
) if lower_only is True or the full banded format (seescipy.linalg.solve_banded()
) if lower_only is False. rhs_extrafloat or numpy.ndarray, shape (
M * N
,), optional If supplied, rhs_extra will be added to the right hand side (
B.T @ W @ y
) of the equation before solving. Default is None, which adds nothing.
 Returns:
 numpy.ndarray, shape (M, N)
The spline, corresponding to
B @ c
, where c are the solved spline coefficients and B is the spline basis.
Notes
Uses the more efficient algorithm from Eilers's paper, although the memory usage is higher than the straigtforward method when the number of knots is high; however, it is significantly faster and memory efficient when the number of knots is lower, which will be the more typical use case.