pybaselines.utils

Module Contents

Functions

difference_matrix

Creates an n-order finite-difference matrix.

gaussian

Generates a Gaussian distribution based on height, center, and sigma.

gaussian2d

Generates a Gaussian distribution based on height, center, and sigma.

gaussian_kernel

Creates an area-normalized gaussian kernel for convolution.

optimize_window

Optimizes the morphological half-window size.

pad_edges

Adds left and right edges to the data.

pad_edges2d

Adds left, right, top, and bottom edges to the data.

padded_convolve

Pads data before convolving to reduce edge effects.

pspline_smooth

Smooths the input data using Penalized Spline smoothing.

relative_difference

Calculates the relative difference, (norm(new-old) / norm(old)), of two values.

whittaker_smooth

Smooths the input data using Whittaker smoothing.

exception pybaselines.utils.ParameterWarning[source]

Warning issued when a parameter value is outside of the recommended range.

For cases where a parameter value is valid and will not cause errors, but is outside of the recommended range of values and as a result may cause issues such as numerical instability that would otherwise be hard to diagnose.

add_note()

Exception.add_note(note) -- add a note to the exception

with_traceback()

Exception.with_traceback(tb) -- set self.__traceback__ to tb and return self.

pybaselines.utils.difference_matrix(data_size, diff_order=2, diff_format=None)[source]

Creates an n-order finite-difference matrix.

Parameters:
data_sizeint

The number of data points.

diff_orderint, optional

The integer differential order; must be >= 0. Default is 2.

diff_formatstr or None, optional

The sparse format to use for the difference matrix. Default is None, which will use the default specified in scipy.sparse.diags().

Returns:
diff_matrixscipy.sparse.spmatrix or scipy.sparse._sparray

The sparse difference matrix.

Raises:
ValueError

Raised if diff_order or data_size is negative.

Notes

The resulting matrices are sparse versions of:

import numpy as np
np.diff(np.eye(data_size), diff_order, axis=0)

This implementation allows using the differential matrices are they are written in various publications, ie. D.T @ D.

Most baseline algorithms use 2nd order differential matrices when doing penalized least squared fitting or Whittaker-smoothing-based fitting.

pybaselines.utils.gaussian(x, height=1.0, center=0.0, sigma=1.0)[source]

Generates a Gaussian distribution based on height, center, and sigma.

Parameters:
xnumpy.ndarray

The x-values at which to evaluate the distribution.

heightfloat, optional

The maximum height of the distribution. Default is 1.0.

centerfloat, optional

The center of the distribution. Default is 0.0.

sigmafloat, optional

The standard deviation of the distribution. Default is 1.0.

Returns:
numpy.ndarray

The Gaussian distribution evaluated with x.

Raises:
ValueError

Raised if sigma is not greater than 0.

pybaselines.utils.gaussian2d(x, z, height=1.0, center_x=0.0, center_z=0.0, sigma_x=1.0, sigma_z=1.0)[source]

Generates a Gaussian distribution based on height, center, and sigma.

Parameters:
xnumpy.ndarray, shape (M, N)

The x-values at which to evaluate the distribution.

znumpy.ndarray, shape (M, N)

The z-values at which to evaluate the distribution.

heightfloat, optional

The maximum height of the distribution. Default is 1.0.

center_xfloat, optional

The center of the distribution in the x-axis. Default is 0.0.

sigma_xfloat, optional

The standard deviation of the distribution in the x-axis. Default is 1.0.

center_zfloat, optional

The center of the distribution in the z-axis. Default is 0.0.

sigma_zfloat, optional

The standard deviation of the distribution in the z-axis. Default is 1.0.

Returns:
numpy.ndarray, shape (M, N)

The Gaussian distribution evaluated with x and z.

Raises:
ValueError

Raised if the input x or z are not two dimensional.

Notes

The input x and z should be two dimensional arrays, which can be gotten from their one dimensional counterparts by using numpy.meshgrid().

pybaselines.utils.gaussian_kernel(window_size, sigma=1.0)[source]

Creates an area-normalized gaussian kernel for convolution.

Parameters:
window_sizeint

The number of points for the entire kernel.

sigmafloat, optional

The standard deviation of the gaussian model.

Returns:
numpy.ndarray, shape (window_size,)

The area-normalized gaussian kernel.

Notes

Return gaus/sum(gaus) rather than creating a unit-area gaussian since the unit-area gaussian would have an area smaller than 1 for window_size < ~ 6 * sigma.

pybaselines.utils.optimize_window(data, increment=1, max_hits=3, window_tol=1e-06, max_half_window=None, min_half_window=None)[source]

Optimizes the morphological half-window size.

Parameters:
dataarray-like

The measured data values. Can be one or two dimensional.

incrementint, optional

The step size for iterating half windows. Default is 1.

max_hitsint, optional

The number of consecutive half windows that must produce the same morphological opening before accepting the half window as the optimum value. Default is 3.

window_tolfloat, optional

The tolerance value for considering two morphological openings as equivalent. Default is 1e-6.

max_half_windowint, optional

The maximum allowable half-window size. If None (default), will be set to (len(data) - 1) / 2.

min_half_windowint, optional

The minimum half-window size. If None (default), will be set to 1.

Returns:
half_windowint or numpy.ndarray[int, int]

The optimized half window size(s). If data is one dimensional, the output is a single integer, and if data is two dimensional, the output is an array of two integers.

Notes

May only provide good results for some morphological algorithms, so use with caution.

References

Perez-Pueyo, R., et al. Morphology-Based Automated Baseline Removal for Raman Spectra of Artistic Pigments. Applied Spectroscopy, 2010, 64, 595-600.

pybaselines.utils.pad_edges(data, pad_length, mode='extrapolate', extrapolate_window=None, **pad_kwargs)[source]

Adds left and right edges to the data.

Parameters:
dataarray-like

The array of the data.

pad_lengthint

The number of points to add to the left and right edges.

modestr or Callable, optional

The method for padding. Default is 'extrapolate'. Any method other than 'extrapolate' will use numpy.pad().

extrapolate_windowint, optional

The number of values to use for linear fitting on the left and right edges. Default is None, which will set the extrapolate window size equal to pad_length.

**pad_kwargs

Any keyword arguments to pass to numpy.pad(), which will be used if mode is not 'extrapolate'.

Returns:
padded_datanumpy.ndarray, shape (N + 2 * half_window,)

The data with padding on the left and right edges.

Notes

If mode is 'extrapolate', then the left and right edges will be fit with a first order polynomial and then extrapolated. Otherwise, uses numpy.pad().

pybaselines.utils.pad_edges2d(data, pad_length, mode='edge', extrapolate_window=None, **pad_kwargs)[source]

Adds left, right, top, and bottom edges to the data.

Parameters:
dataarray-like, shape (M, N)

The 2D array of the data.

pad_lengthint or Sequence[int, int]

The number of points to add to the top, bottom, left, and right edges. If a single value is given, all edges have the same padding. If a sequence of two values is given, the first value will be the padding on the top and bottom (rows), and the second value will pad the left and right (columns).

modestr or Callable, optional

The method for padding. Default is 'edge'. Any method other than 'extrapolate' will use numpy.pad().

extrapolate_windowint or Sequence[int, int] or Sequence[int, int, int, int], optional

The number of values to use for linear fitting on the top, bottom, left, and right edges. Default is None, which will set the extrapolate window size equal to pad_length.

**pad_kwargs

Any keyword arguments to pass to numpy.pad(), which will be used if mode is not 'extrapolate'.

Returns:
padded_datanumpy.ndarray

The data with padding on the top, bottom, left, and right edges.

Notes

If mode is 'extrapolate', then each edge will be extended by linear fits along each row and column, and the corners are calculated by averaging the linear sections.

pybaselines.utils.padded_convolve(data, kernel, mode='reflect', **pad_kwargs)[source]

Pads data before convolving to reduce edge effects.

Parameters:
dataarray-like, shape (N,)

The data to convolve.

kernelarray-like, shape (M,)

The convolution kernel.

modestr or Callable, optional

The method for padding to pass to pad_edges(). Default is 'reflect'.

**pad_kwargs

Any additional keyword arguments to pass to pad_edges().

Returns:
convolutionnumpy.ndarray, shape (N,)

The convolution output.

pybaselines.utils.pspline_smooth(data, x_data=None, lam=10.0, num_knots=100, spline_degree=3, diff_order=2, weights=None, check_finite=True)[source]

Smooths the input data using Penalized Spline smoothing.

The input is smoothed by solving the equation (B.T @ W @ B + lam * D.T @ D) y_smooth = B.T @ W @ y, where W is a matrix with weights on the diagonals, D is the finite difference matrix, and B is the spline basis matrix.

Parameters:
dataarray-like, shape (N,)

The y-values of the measured data, with N data points.

x_dataarray-like, shape (N,), optional

The x-values of the measured data. Default is None, which will create an array from -1 to 1 with N points.

lamfloat, optional

The smoothing parameter. Larger values will create smoother baselines. Default is 1e1.

num_knotsint, optional

The number of knots for the spline. Default is 100.

spline_degreeint, optional

The degree of the spline. Default is 3, which is a cubic spline.

diff_orderint, optional

The order of the finite difference matrix. Must be greater than or equal to 0. Default is 2 (second order differential matrix). Typical values are 2 or 1.

weightsarray-like, shape (N,), optional

The weighting array, used to override the function's baseline identification to designate peak points. Only elements with 0 or False values will have an effect; all non-zero values are considered baseline points. If None (default), then will be an array with size equal to N and all values set to 1.

check_finitebool, optional

If True, will raise an error if any values if data or weights are not finite. Default is False, which skips the check.

Returns:
y_smoothnumpy.ndarray, shape (N,)

The smoothed data.

tuple(numpy.ndarray, numpy.ndarray, int)

A tuple of the spline knots, spline coefficients, and spline degree, which can be used to reconstruct the fit spline. Useful if needing to recreate the spline with different x-values.

References

Eilers, P., et al. Splines, knots, and penalties. Wiley Interdisciplinary Reviews: Computational Statistics, 2010, 2(6), 637-653.

pybaselines.utils.relative_difference(old, new, norm_order=None)[source]

Calculates the relative difference, (norm(new-old) / norm(old)), of two values.

Used as an exit criteria in many baseline algorithms.

Parameters:
oldnumpy.ndarray or float

The array or single value from the previous iteration.

newnumpy.ndarray or float

The array or single value from the current iteration.

norm_orderint, optional

The type of norm to calculate. Default is None, which is l2 norm for arrays, abs for scalars.

Returns:
float

The relative difference between the old and new values.

pybaselines.utils.whittaker_smooth(data, lam=1000000.0, diff_order=2, weights=None, check_finite=True)[source]

Smooths the input data using Whittaker smoothing.

The input is smoothed by solving the equation (W + lam * D.T @ D) y_smooth = W @ y, where W is a matrix with weights on the diagonals and D is the finite difference matrix.

Parameters:
dataarray-like, shape (N,)

The y-values of the measured data, with N data points.

lamfloat, optional

The smoothing parameter. Larger values will create smoother baselines. Default is 1e6.

diff_orderint, optional

The order of the finite difference matrix. Must be greater than or equal to 0. Default is 2 (second order differential matrix). Typical values are 2 or 1.

weightsarray-like, shape (N,), optional

The weighting array, used to override the function's baseline identification to designate peak points. Only elements with 0 or False values will have an effect; all non-zero values are considered baseline points. If None (default), then will be an array with size equal to N and all values set to 1.

check_finitebool, optional

If True, will raise an error if any values if data or weights are not finite. Default is False, which skips the check.

Returns:
y_smoothnumpy.ndarray, shape (N,)

The smoothed data.

References

Eilers, P. A Perfect Smoother. Analytical Chemistry, 2003, 75(14), 3631-3636.