pybaselines._banded_utils

Module Contents

Classes

PenalizedSystem

An object for setting up and solving banded penalized least squares linear systems.

Functions

diff_penalty_diagonals

Creates the diagonals of the finite difference penalty matrix.

diff_penalty_matrix

Creates the finite difference penalty matrix.

difference_matrix

Creates an n-order finite-difference matrix.

class pybaselines._banded_utils.PenalizedSystem(data_size, lam=1, diff_order=2, allow_lower=True, reverse_diags=None, allow_pentapy=True, padding=0, pentapy_solver=2)[source]

An object for setting up and solving banded penalized least squares linear systems.

Attributes:
diff_orderint

The difference order of the penalty.

lowerbool

If True, the penalty uses only the lower bands of the symmetric banded penalty. Will use scipy.linalg.solveh_banded() for solving. If False, contains both the upper and lower bands of the penalty and will use either scipy.linalg.solve_banded() (if using_pentapy is False) or _pentapy_solver() when solving.

main_diagonalnumpy.ndarray

The main diagonal of the penalty matrix. Is updated when adding additional matrices to the penalty through add_penalty(), and takes into account whether the penalty is only the lower bands or the total bands.

main_diagonal_indexint

The index of the main diagonal for penalty. Is updated when adding additional matrices to the penalty through add_penalty(), and takes into account whether the penalty is only the lower bands or the total bands.

num_bandsint

The number of bands in the penalty. The number of bands is assumbed to be symmetric, so the number of upper and lower bands should both be equal to num_bands.

original_diagonalsnumpy.ndarray

The original penalty diagonals before multiplying by lam or adding any padding. Maintained so that repeated computations with different lam values can be quickly set up. original_diagonals can be either the full or lower bands of the penalty, and may be reveresed, it depends on the set up. Reset by calling reset_diagonals().

penaltynumpy.ndarray

The current penalty. Originally is original_diagonals after multiplying by lam and applying padding, but can also be changed by calling add_penalty(). Reset by calling reset_diagonals().

pentapy_solverint or str

The integer or string designating which solver to use if using pentapy. See pentapy.solve() for available options, although 1 or 2 are the most relevant options. Default is 2.

reversedbool

If True, the penalty is reversed of the typical LAPACK banded format. Useful if multiplying the penalty with an array since the rows get shifted, or if using pentapy's solver.

using_pentapybool

If True, will use pentapy's solver when solving.

add_diagonal(value)[source]

Adds a diagonal array or float to the original penalty matrix.

Parameters:
valuefloat or numpy.ndarray

The number or array to add to the main diagonal of the penalty.

Returns:
numpy.ndarray

The penalty with the main diagonal updated.

add_penalty(penalty)[source]

Updates self.penalty with an additional penalty and updates the bands.

Parameters:
penaltyarray-like

The additional penalty to add to self.penalty.

Returns:
numpy.ndarray

The updated self.penalty.

reset_diagonals(lam=1, diff_order=2, allow_lower=True, reverse_diags=None, allow_pentapy=True, padding=0)[source]

Resets the diagonals of the system and all of the attributes.

Useful for reusing the penalized system for a different lam value.

Parameters:
lamfloat, optional

The penalty factor applied to the difference matrix. Larger values produce smoother results. Must be greater than 0. Default is 1.

diff_orderint, optional

The difference order of the penalty. Default is 2 (second order difference).

allow_lowerbool, optional

If True (default), will allow only using the lower bands of the penalty matrix, which allows using scipy.linalg.solveh_banded() instead of the slightly slower scipy.linalg.solve_banded().

reverse_diags{None, False, True}, optional

If True, will reverse the order of the diagonals of the squared difference matrix. If False, will never reverse the diagonals. If None (default), will only reverse the diagonals if using pentapy's solver.

allow_pentapybool, optional

If True (default), will allow using pentapy's solver if diff_order is 2 and pentapy is installed. pentapy's solver is faster than scipy's banded solvers.

paddingint, optional

The number of extra layers of zeros to add to the bottom and potentially the top if the full bands are used. Default is 0, which adds no extra layers. Negative padding is treated as equivalent to 0.

reverse_penalty()[source]

Reverses the penalty and original diagonals for the system.

Raises:
ValueError

Raised if self.lower is True, since reversing the half diagonals does not make physical sense.

solve(lhs, rhs, overwrite_ab=False, overwrite_b=False, check_finite=False, l_and_u=None, check_output=False)[source]

Solves the equation A @ x = rhs, given A in banded format as lhs.

Parameters:
lhsarray-like, shape (M, N)

The left-hand side of the equation, in banded format. lhs is assumed to be some slight modification of self.penalty in the same format (reversed, lower, number of bands, etc. are all the same).

rhsarray-like, shape (N,)

The right-hand side of the equation.

overwrite_abbool, optional

Whether to overwrite lhs when using scipy.linalg.solveh_banded() or scipy.linalg.solve_banded(). Default is False.

overwrite_bbool, optional

Whether to overwrite rhs when using scipy.linalg.solveh_banded() or scipy.linalg.solve_banded(). Default is False.

check_finitebool, optional

Whether to check if the inputs are finite when using scipy.linalg.solveh_banded() or scipy.linalg.solve_banded(). Default is False.

l_and_uContainer(int, int), optional

The number of lower and upper bands in lhs when using scipy.linalg.solve_banded(). Default is None, which uses (len(lhs) // 2, len(lhs) // 2).

check_outputbool, optional

If True, will check the output for non-finite values when using _pentapy_solver(). Default is False.

Returns:
outputnumpy.ndarray, shape (N,)

The solution to the linear system, x.

pybaselines._banded_utils.diff_penalty_diagonals(data_size, diff_order=2, lower_only=True, padding=0)[source]

Creates the diagonals of the finite difference penalty matrix.

If D is the finite difference matrix, then the finite difference penalty matrix is defined as D.T @ D. The penalty matrix is banded and symmetric, so the non-zero diagonal bands can be computed efficiently.

Parameters:
data_sizeint

The number of data points.

diff_orderint, optional

The integer differential order; must be >= 0. Default is 2.

lower_onlybool, optional

If True (default), will return only the lower diagonals of the matrix. If False, will include all diagonals of the matrix.

paddingint, optional

The number of extra layers of zeros to add to the bottom and top, if lower_only is True. Useful if working with other diagonal arrays with a different number of rows. Default is 0, which adds no extra layers. Negative padding is treated as equivalent to 0.

Returns:
diagonalsnumpy.ndarray

The diagonals of the finite difference penalty matrix.

Raises:
ValueError

Raised if diff_order is negative or if data_size less than 1.

Notes

Equivalent to calling:

from pybaselines.utils import difference_matrix
diff_matrix = difference_matrix(data_size, diff_order)
output = (diff_matrix.T @ diff_matrix).todia().data[::-1]
if lower_only:
    output = output[diff_order:]

but is several orders of magnitude times faster.

The data is output in the banded format required by SciPy's solve_banded and solveh_banded functions.

pybaselines._banded_utils.diff_penalty_matrix(data_size, diff_order=2, diff_format='csr')[source]

Creates the finite difference penalty matrix.

If D is the finite difference matrix, then the finite difference penalty matrix is defined as D.T @ D.

Parameters:
data_sizeint

The number of data points.

diff_orderint, optional

The integer differential order; must be >= 0. Default is 2.

diff_formatstr or None, optional

The sparse format to use for the difference matrix. Default is 'csr'.

Returns:
penalty_matrixscipy.sparse.spmatrix or scipy.sparse._sparray

The sparse difference penalty matrix.

Raises:
ValueError

Raised if diff_order is greater or equal to data_size.

Notes

Equivalent to calling:

from pybaselines.utils import difference_matrix
diff_matrix = difference_matrix(data_size, diff_order)
penalty_matrix = diff_matrix.T @ diff_matrix

but should be faster since the bands within the penalty matrix can be gotten without the matrix multiplication.

pybaselines._banded_utils.difference_matrix(data_size, diff_order=2, diff_format=None)[source]

Creates an n-order finite-difference matrix.

Parameters:
data_sizeint

The number of data points.

diff_orderint, optional

The integer differential order; must be >= 0. Default is 2.

diff_formatstr or None, optional

The sparse format to use for the difference matrix. Default is None, which will use the default specified in scipy.sparse.diags().

Returns:
diff_matrixscipy.sparse.spmatrix or scipy.sparse._sparray

The sparse difference matrix.

Raises:
ValueError

Raised if diff_order or data_size is negative.

Notes

The resulting matrices are sparse versions of:

import numpy as np
np.diff(np.eye(data_size), diff_order, axis=0)

This implementation allows using the differential matrices are they are written in various publications, ie. D.T @ D.

Most baseline algorithms use 2nd order differential matrices when doing penalized least squared fitting or Whittaker-smoothing-based fitting.