Whittaker Baselines


Excellent introductory papers on two dimensional penalized least squares are [1] and [2]. Whittaker-smoothing-based algorithms are extended to two dimensional data as follows:

Let the number of rows be \(M\) and the number of columns \(N\) within the matrix of measured data \(Y\). Note that \(y\) is the flattened array of matrix \(Y\) with length \(M * N\). Analogous to the 1D case, the goal is to make the baseline match the measured data as well as it can while also penalizing the roughness of the baseline, resulting in the following minimization:

\[\sum\limits_{i}^M \sum\limits_{j}^N W_{ij} (Y_{ij} - V_{ij})^2 + \lambda_r \sum\limits_{i}^{M - d_r} (V_{i\bullet} \Delta^{d_r})^2 + \lambda_c \sum\limits_{j}^{N - d_c} (\Delta^{d_c} V_{j\bullet})^2\]

where \(Y_{ij}\) is the measured data, \(V_{ij}\) is the estimated baseline, \(\lambda_r\) is the penalty along the rows, \(\lambda_c\) is the penalty along the columns, \(W_{ij}\) is the weighting, \(\Delta^{d_r}\) is the finite-difference operator of order \(d_r\) along each row of \(V\), \(V_{i\bullet}\), and \(\Delta^{d_c}\) is the finite-difference operator of order \(d_c\) along each column of \(V\), \(V_{j\bullet}\).

The resulting linear equation for solving the above minimization is:

\[(W_{diag} + \lambda_r I_M \otimes D_{d_r}^{\top} D_{d_r} + \lambda_c D_{d_c}^{\top} D_{d_c} \otimes I_M) v = w y\]

where \(W_{diag}\) is the diagaonal matrix of the flattened weights, and \(D_d\) is the matrix version of \(\Delta^d\), as already explained for the 1D case. Further, \(\otimes\) denotes the Kronecker product, and \(I_M\) and \(I_N\) are the identity matrices of length \(M\) and \(N\), respectively. After solving, the array \(v\) can then be reshaped into the matrix \(V\).

Since the analytical solution for 2D requires matrices of shape \((M*N, M*N)\), it is quite memory and computationally expensive to solve. Although the left hand side of the equation is still sparse and symmetric, it cannot be solved as easily compared to the 1D case since the bandwidth is no longer small due to the penalties along both the rows and columns (plus the sparse solver currently available in SciPy cannot make use of the symmetric nature of the matrix; using Cholesky factorization does provide a speed up but still does not scale well above ~500x500 sized matrices). However...


By following the excellent insights laid out by G. Biessy in [2], the dimensionality of the system can be reduced by using eigendecomposition on each of the two penalty matrices, \(D_{d_r}^{\top} D_{d_r}\) and \(D_{d_c}^{\top} D_{d_c}\). (Note that speeding up Whittaker smoothing using factorization in 1D and using the analytical eigenvalues in nD (great paper) are established methods, although they require using a fixed difference order, and, in the second case, of using different boundary conditions that unfortunately do not translate well from smoothing to baseline correction). The general eigendecomposition of the penalty matrix gives

\[D_{d}^{\top} D_{d} = U \Sigma U^{\top}\]

where \(U\) is the matrix of eigenvectors and \(\Sigma\) is a diagonal matrix with the eigenvalues along the diagonal. Letting \(B = U_c \otimes U_r\) denote the kronecker product of the eigenvector matrices of the penalty for the columns and rows, and \(g\) and \(h\) denote the number of eigenvectors along the rows and columns, respectively, the linear equation can be rewritten as:

\[(B^{\top} W_{diag} B + \lambda_r I_h \otimes \Sigma_r + \lambda_c \Sigma_c \otimes I_g) \alpha = B^{\top} W_{diag} y\]

and the baseline is then:

\[v = B \alpha\]

The beauty of this reparameterization when applied to baseline correction is twofold:

  1. The number of eigenvalues required to approximate the analytical solution depends on the required smoothness, ie. some constant approximated by \(\lambda / (\text{number of data points})\) that does not appreciably change with data size. Baselines require much less smoothness than smoothing, so the number of eigenvalues is relatively low (from testing, ~5-10 for low order polynomial baselines and ~15-25 for sinusoidal baselines).

  2. Since experimental data is measured on gridded data (ie. \(Y_{ij} = f(x_i, z_j)\)), the above equation can be further optimized by expressing it as a generalized linear array model, following the brilliant insights of Eilers, Currie, and Durbán, exactly as explained for 2D penalized splines.

An example examines how to determine the approximate number of eigenvalues required to represent different baselines.


For two dimensional data, Whittaker-smoothing-based algorithms take a single lam, parameter that can either be a single number, in which case both the rows and columns will use the same smoothing parameter, ie. \(\lambda_r = \lambda_c\), or a sequence of two numbers (\(\lambda_r\), \(\lambda_c\)) to use different values for the rows and columns.


asls (Asymmetric Least Squares)

asls(): explanation for the algorithm.

(Source code)





iasls (Improved Asymmetric Least Squares)

iasls(): explanation for the algorithm. Eigendecomposition is not allowed for this method.

(Source code)





airpls (Adaptive Iteratively Reweighted Penalized Least Squares)

airpls(): explanation for the algorithm.

(Source code)





arpls (Asymmetrically Reweighted Penalized Least Squares)

arpls(): explanation for the algorithm.

(Source code)





drpls (Doubly Reweighted Penalized Least Squares)

drpls(): explanation for the algorithm. Eigendecomposition is not allowed for this method.

(Source code)





iarpls (Improved Asymmetrically Reweighted Penalized Least Squares)

iarpls(): explanation for the algorithm.

(Source code)





aspls (Adaptive Smoothness Penalized Least Squares)

aspls(): explanation for the algorithm. Eigendecomposition is not allowed for this method.

(Source code)





psalsa (Peaked Signal's Asymmetric Least Squares Algorithm)

psalsa(): explanation for the algorithm.

(Source code)