======================== Classification Baselines ======================== Introduction ------------ Classification methods rely on classifying peak and/or baseline segments, similar to :ref:`selective masking ` as explained in the polynomial section, but make use of sophisticated techniques to determine the baseline points rather than relying on manual selection. All classification functions allow inputting weights to override the baseline classification, which can be helpful, for example, to ensure a small peak is not classified as baseline without having to alter any parameters which could otherwise reduce the effectiveness of the classification method. The plot below shows such an example. .. plot:: :align: center import numpy as np import matplotlib.pyplot as plt from pybaselines.utils import gaussian from pybaselines import Baseline x = np.linspace(1, 1000, 500) signal = ( gaussian(x, 1, 180, 25) + gaussian(x, 8, 350, 10) + gaussian(x, 15, 400, 8) + gaussian(x, 13, 700, 12) + gaussian(x, 9, 800, 10) ) real_baseline = 5 + 5 * np.exp(-x / 400) noise = np.random.default_rng(1).normal(0, 0.2, x.size) y = signal + real_baseline + noise weights = np.ones(y.shape[0]) # a weight of 0 designates that it is a peak; ensures the small peak at ~180 is # not classified as part of the baseline weights[(x > 100) & (x < 250)] = 0 baseline_fitter = Baseline(x, check_finite=False) unweighted_baseline = baseline_fitter.std_distribution(y, half_window=10, num_std=1.3)[0] weighted_baseline = baseline_fitter.std_distribution( y, half_window=10, num_std=1.3, weights=weights )[0] fig, ax = plt.subplots(tight_layout={'pad': 0.2}) data_handle = ax.plot(y) baseline_handle = ax.plot(unweighted_baseline, '-') weighted_baseline_handle = ax.plot(weighted_baseline, '--') ax.set_yticks([]) ax.set_xticks([]) ax.legend( (data_handle[0], baseline_handle[0], weighted_baseline_handle[0]), ('data', 'unweighted baseline', 'weighted baseline'), frameon=False ) plt.show() Algorithms ---------- dietrich (Dietrich's Classification Method) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ :meth:`~.Baseline.dietrich` calculates the power spectrum of the data as the squared derivative of the data. Then baseline points are identified by iteratively removing points where the mean of the power spectrum is less a multiple of the standard deviation of the power spectrum. The baseline is created by first interpolating through all baseline points, and then iteratively fitting a polynomial to the interpolated baseline. .. plot:: :align: center :context: reset import numpy as np import matplotlib.pyplot as plt from pybaselines.utils import gaussian from pybaselines import Baseline def create_data(): x = np.linspace(1, 1000, 500) signal = ( gaussian(x, 6, 180, 5) + gaussian(x, 8, 350, 10) + gaussian(x, 6, 550, 5) + gaussian(x, 9, 800, 10) ) signal_2 = ( gaussian(x, 9, 100, 12) + gaussian(x, 15, 400, 8) + gaussian(x, 13, 700, 12) + gaussian(x, 9, 880, 8) ) signal_3 = ( gaussian(x, 8, 150, 10) + gaussian(x, 20, 120, 12) + gaussian(x, 16, 300, 20) + gaussian(x, 12, 550, 5) + gaussian(x, 20, 750, 12) + gaussian(x, 18, 800, 18) + gaussian(x, 15, 830, 12) ) noise = np.random.default_rng(1).normal(0, 0.2, x.size) linear_baseline = 3 + 0.01 * x exponential_baseline = 5 + 15 * np.exp(-x / 400) gaussian_baseline = 5 + gaussian(x, 20, 500, 500) baseline_1 = linear_baseline baseline_2 = gaussian_baseline baseline_3 = exponential_baseline baseline_4 = 10 - 0.005 * x + gaussian(x, 5, 850, 200) baseline_5 = linear_baseline + 20 y1 = signal * 2 + baseline_1 + 5 * noise y2 = signal + signal_2 + signal_3 + baseline_2 + noise y3 = signal + signal_2 + baseline_3 + noise y4 = signal + + signal_2 + baseline_4 + noise * 0.5 y5 = signal * 2 - signal_2 + baseline_5 + noise baselines = (baseline_1, baseline_2, baseline_3, baseline_4, baseline_5) data = (y1, y2, y3, y4, y5) return x, data, baselines def create_plots(data=None, baselines=None): fig, axes = plt.subplots( 3, 2, tight_layout={'pad': 0.1, 'w_pad': 0, 'h_pad': 0}, gridspec_kw={'wspace': 0, 'hspace': 0} ) axes = axes.ravel() legend_handles = [] if data is None: plot_data = False legend_handles.append(None) else: plot_data = True if baselines is None: plot_baselines = False legend_handles.append(None) else: plot_baselines = True for i, axis in enumerate(axes): axis.set_xticks([]) axis.set_yticks([]) axis.tick_params( which='both', labelbottom=False, labelleft=False, labeltop=False, labelright=False ) if i < 5: if plot_data: data_handle = axis.plot(data[i]) if plot_baselines: baseline_handle = axis.plot(baselines[i], lw=2.5) fit_handle = axes[-1].plot((), (), 'g--') if plot_data: legend_handles.append(data_handle[0]) if plot_baselines: legend_handles.append(baseline_handle[0]) legend_handles.append(fit_handle[0]) if None not in legend_handles: axes[-1].legend( (data_handle[0], baseline_handle[0], fit_handle[0]), ('data', 'real baseline', 'estimated baseline'), loc='center', frameon=False ) return fig, axes, legend_handles x, data, baselines = create_data() baseline_fitter = Baseline(x, check_finite=False) figure, axes, handles = create_plots(data, baselines) for i, (ax, y) in enumerate(zip(axes, data)): if i < 4: poly_order = i + 1 else: poly_order = 1 if i == 1: num_std = 2.5 else: num_std = 3 baseline, params = baseline_fitter.dietrich( y, smooth_half_window=5, num_std=num_std, poly_order=poly_order, min_length=3 ) ax.plot(baseline, 'g--') golotvin (Golotvin's Classification Method) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ :meth:`~.Baseline.golotvin` divides the data into sections and takes the minimum standard deviation of all the sections as the noise's standard deviation for the entire data. Then classifies any point where the rolling max minus min is less than a multiple of the noise's standard deviation as belonging to the baseline. .. plot:: :align: center :context: close-figs # to see contents of create_data function, look at the top-most algorithm's code figure, axes, handles = create_plots(data, baselines) for i, (ax, y) in enumerate(zip(axes, data)): if i == 1: half_window = 20 else: half_window = 10 if i in (1, 3): num_std = 40 else: num_std = 10 baseline, params = baseline_fitter.golotvin(y, half_window=half_window, num_std=num_std) ax.plot(baseline, 'g--') std_distribution (Standard Deviation Distribution) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ :meth:`~.Baseline.std_distribution` identifies baseline segments by analyzing the rolling standard deviation distribution. The rolling standard deviations are split into two distributions, with the smaller distribution assigned to noise. Baseline points are then identified as any point where the rolled standard deviation is less than a multiple of the median of the noise's standard deviation distribution. .. plot:: :align: center :context: close-figs # to see contents of create_data function, look at the top-most algorithm's code figure, axes, handles = create_plots(data, baselines) for i, (ax, y) in enumerate(zip(axes, data)): if i == 1: half_window = 30 num_std = 0.9 elif i in (2, 3): half_window = 8 num_std = 3.5 else: half_window = 12 num_std = 1.1 baseline, params = baseline_fitter.std_distribution(y, half_window=half_window, num_std=num_std) ax.plot(baseline, 'g--') fastchrom (FastChrom's Baseline Method) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ :meth:`~.Baseline.fastchrom` identifies baseline segments by analyzing the rolling standard deviation distribution, similar to :meth:`~.Baseline.std_distribution`. Baseline points are identified as any point where the rolling standard deviation is less than the specified threshold, and peak regions are iteratively interpolated until the baseline is below the data. .. plot:: :align: center :context: close-figs # to see contents of create_data function, look at the top-most algorithm's code figure, axes, handles = create_plots(data, baselines) for i, (ax, y) in enumerate(zip(axes, data)): if i == 4: min_fwhm = y.shape[0] # ensure it doesn't try to fill in negative peaks else: min_fwhm = None baseline, params = baseline_fitter.fastchrom( y, half_window=12, threshold=1, min_fwhm=min_fwhm ) ax.plot(baseline, 'g--') cwt_br (Continuous Wavelet Transform Baseline Recognition) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ :meth:`~.Baseline.cwt_br` identifies baseline segments by performing a continous wavelet transform (CWT) on the input data at various scales, and picks the scale with the first local minimum in the Shannon entropy. The threshold for baseline points is obtained by fitting a Gaussian to the histogram of the CWT at the optimal scale, and the final baseline is fit using a weighted polynomial where identified baseline points are given a weight of 1 while all other points have a weight of 0. .. plot:: :align: center :context: close-figs scales = np.arange(2, 40) # to see contents of create_data function, look at the top-most algorithm's code figure, axes, handles = create_plots(data, baselines) for i, (ax, y) in enumerate(zip(axes, data)): if i < 4: poly_order = i + 1 symmetric = False else: poly_order = 1 symmetric = True if i in (0, 4): min_length = 3 else: min_length = 20 baseline, params = baseline_fitter.cwt_br( y, poly_order=poly_order, scales=scales, min_length=min_length, symmetric=symmetric, num_std=0.5 ) ax.plot(baseline, 'g--') fabc (Fully Automatic Baseline Correction) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ :meth:`~.Baseline.fabc` identifies baseline segments by thresholding the squared first derivative of the data, similar to :meth:`~.Baseline.dietrich`. However, fabc approximates the first derivative using a continous wavelet transform with the Haar wavelet, which is more robust to noise than the numerical derivative in Dietrich's method. The baseline is then fit using Whittaker smoothing with all baseline points having a weight of 1 and all other points a weight of 0. .. plot:: :align: center :context: close-figs # to see contents of create_data function, look at the top-most algorithm's code figure, axes, handles = create_plots(data, baselines) for i, (ax, y) in enumerate(zip(axes, data)): if i == 1: lam = 1e4 elif i == 3: lam = 5e2 elif i in (0, 4): lam = 1e6 else: lam = 1e3 if i == 1: num_std = 2.5 else: num_std = 3 baseline, params = baseline_fitter.fabc(y, lam=lam, scale=16, num_std=num_std, min_length=3) ax.plot(baseline, 'g--') rubberband (Rubberband Method) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ :meth:`.rubberband` uses a convex hull to find local minima of the data, which are then used to construct the baseline using either linear interpolation or Whittaker smoothing. The rubberband method is simple and easy to use for convex shaped data, but performs poorly for concave data. To get around this, some commercial spectroscopy software use a `patented method `_ to coerce the data into a convex shape so that the rubberband method still works. pybaselines uses an alternate approach of allowing splitting the data in segments, in order to reduce the concavity of each individual section; it is less user-friendly than Bruker's method but works well enough for data with similar baselines and peak positions. .. note:: For noisy data, rubberband performs significantly better when smoothing the data beforehand. .. plot:: :align: center :context: close-figs # to see contents of create_data function, look at the top-most algorithm's code figure, axes, handles = create_plots(data, baselines) for i, (ax, y) in enumerate(zip(axes, data)): if i == 1: segments = [100, 250, 300] elif i == 3: segments = [250, 380] else: segments = 1 if i < 4: smooth_half_window = 5 else: smooth_half_window = 0 baseline, params = baseline_fitter.rubberband( y, segments=segments, lam=0, smooth_half_window=smooth_half_window ) ax.plot(baseline, 'g--') By using Whittaker smoothing (or other smoothing interpolation methods) rather than linear interpolation to construct the baseline from the convex hull points, the negative effects of applying the rubberband method to concave data can be slightly reduced, as seen below. .. plot:: :align: center :context: close-figs # to see contents of create_data function, look at the top-most algorithm's code figure, axes, handles = create_plots(data, baselines) for i, (ax, y) in enumerate(zip(axes, data)): if i == 1: segments = [100, 250, 300] elif i == 3: segments = [380] else: segments = 1 if i < 4: smooth_half_window = 5 else: smooth_half_window = 0 baseline, params = baseline_fitter.rubberband( y, segments=segments, lam=100, smooth_half_window=smooth_half_window ) ax.plot(baseline, 'g--')