.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "generated/examples/whittaker/plot_lam_vs_data_size.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_generated_examples_whittaker_plot_lam_vs_data_size.py>`
        to download the full example code.

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_generated_examples_whittaker_plot_lam_vs_data_size.py:


`lam` vs data size
------------------

When publishing new Whittaker-smoothing-based algorithms in literature, the `lam` value
used by the researchers is usually reported as a single value or a range of values.
However, these values are deceptive since the `lam` value required for a particular
Whittaker-smoothing-based algorithm is dependent on the number of data points. Thus,
this can cause issues when adapting an algorithm to a new set of data since the published
optimal `lam` value is not universal. This example shows an analysis of this dependence
for all available :ref:`Whittaker smoothing methods <api/Baseline:Whittaker Smoothing Algorithms>`.

Note that the exact optimal `lam` values reported in this example are not of significant
use since they depend on many other factors such as the baseline curvature, noise, peaks,
etc.; however, the examined trends can be used to simplify the process of selecting `lam`
values for fitting new datasets.

.. GENERATED FROM PYTHON SOURCE LINES 20-79

.. code-block:: Python


    from itertools import cycle
    import warnings

    import matplotlib.pyplot as plt
    import numpy as np

    from pybaselines import Baseline, utils


    def _minimize(data, known_baseline, func, lams, **kwargs):
        """Finds the `lam` value that minimizes the L2 norm."""
        min_error = np.inf
        best_lam = lams[0]
        with warnings.catch_warnings():
            # ignore warnings that occur when using very small lam values
            warnings.filterwarnings('ignore')
            for lam in lams:
                try:
                    baseline = func(data, lam=10**lam, **kwargs)[0]
                except np.linalg.LinAlgError:
                    continue  # numerical instability can occur for lam >~1e12, just ignore
                fit_error = np.linalg.norm(known_baseline - baseline, 2)
                if fit_error < min_error:
                    min_error = fit_error
                    best_lam = lam

        return best_lam


    def optimize_lam(data, known_baseline, func, previous_min=None, **kwargs):
        """
        Finds the optimum `lam` value.

        The optimal `lam` value should be ``10**best_lam``. Could alternatively
        use scipy.optimize.fmin, but this simple version is enough for examples.

        """
        if previous_min is None:
            min_lam = -1
        else:
            min_lam = previous_min - 0.5
        # coarse optimization
        lams = np.arange(min_lam, 13.5, 0.5)
        best_lam = _minimize(data, known_baseline, func, lams, **kwargs)
        # fine optimization
        lams = np.arange(best_lam - 0.5, best_lam + 0.7, 0.2)
        best_lam = _minimize(data, known_baseline, func, lams, **kwargs)

        return best_lam


    def iasls(*args, lam=1, p=0.05, **kwargs):
        """Ensures the `lam_1` value for whittaker.iasls scales with `lam`."""
        # not sure if lam_1 should be fixed or proportional to lam;
        # both give similar results
        return Baseline().iasls(*args, lam=lam, lam_1=1e-8 * lam, p=p, **kwargs)


.. GENERATED FROM PYTHON SOURCE LINES 81-86

Three baselines will be tested: an exponentially decaying baseline, a gaussian
baseline, and a sinusoidal baseline. The exponential baseline is the most smooth
and the sine baseline is the least smooth, so it would be expected that a lower
`lam` value would be required to fit the sine baseline and a higher value for the
exponential baseline. The three different plots are shown below.

.. GENERATED FROM PYTHON SOURCE LINES 86-94

.. code-block:: Python


    bkg_dict = {}
    bkg_types = ('exponential', 'gaussian', 'sine')
    for bkg_type in bkg_types:
        bkg_dict[bkg_type] = {}
        plt.plot(utils._make_data(1000, bkg_type)[1], label=bkg_type)
    plt.legend()


.. image-sg:: /generated/examples/whittaker/images/sphx_glr_plot_lam_vs_data_size_001.png
   :alt: plot lam vs data size
   :srcset: /generated/examples/whittaker/images/sphx_glr_plot_lam_vs_data_size_001.png
   :class: sphx-glr-single-img


.. GENERATED FROM PYTHON SOURCE LINES 95-98

For each function, the optimal `lam` value will be calculated for data sizes
ranging from 500 to 20000 points. Further, the intercept and slope of the linear fit
of the log of the data size, N, and the log of the `lam` value will be reported.

.. GENERATED FROM PYTHON SOURCE LINES 98-147

.. code-block:: Python

    print('Function, baseline type, intercept & slope of log(N) vs log(lam) fit')
    print('-' * 60)

    show_plots = False  # for debugging
    num_points = np.logspace(np.log10(500), np.log10(20000), 6, dtype=int)
    symbols = cycle(['o', 's', 'd'])
    for i, func_name in enumerate((
        'asls', 'iasls', 'airpls', 'arpls', 'iarpls', 'drpls', 'aspls', 'psalsa', 'derpsalsa',
        'brpls', 'lsrpls'
    )):
        legend = [[], []]
        _, ax = plt.subplots(num=func_name)
        for bkg_type in bkg_types:
            best_lams = np.empty_like(num_points, float)
            min_lam = None
            for j, num_x in enumerate(num_points):
                if func_name == 'iasls':
                    func = iasls
                else:
                    func = getattr(Baseline(), func_name)
                x, y, baseline = utils._make_data(num_x, bkg_type)
                # use a slightly lower tolerance to speed up the calculation
                min_lam = optimize_lam(y, baseline, func, min_lam, tol=1e-2, max_iter=50)
                best_lams[j] = min_lam

                if show_plots:
                    plt.figure(num=num_x)
                    if i == 0:
                        plt.plot(y)
                    plt.plot(baseline)
                    plt.plot(func(y, lam=10**min_lam)[0], '--')

            fit = np.polynomial.polynomial.Polynomial.fit(np.log10(num_points), best_lams, 1)
            coeffs = fit.convert().coef
            print(f'{func_name:<11} {bkg_type:<13} {coeffs}')
            line = 10**fit(np.log10(num_points))
            bkg_dict[bkg_type][func_name] = line

            handle_1 = ax.plot(num_points, line, label=bkg_type)[0]
            handle_2 = ax.plot(num_points, 10**best_lams, next(symbols))[0]
            legend[0].append((handle_1, handle_2))
            legend[1].append(bkg_type)

        ax.loglog()
        ax.legend(*legend)
        ax.set_xlabel('Input Array Size, N')
        ax.set_ylabel('Optimal lam Value')
        ax.set_title(func_name)


.. rst-class:: sphx-glr-horizontal


    *

      .. image-sg:: /generated/examples/whittaker/images/sphx_glr_plot_lam_vs_data_size_002.png
         :alt: asls
         :srcset: /generated/examples/whittaker/images/sphx_glr_plot_lam_vs_data_size_002.png
         :class: sphx-glr-multi-img

    *

      .. image-sg:: /generated/examples/whittaker/images/sphx_glr_plot_lam_vs_data_size_003.png
         :alt: iasls
         :srcset: /generated/examples/whittaker/images/sphx_glr_plot_lam_vs_data_size_003.png
         :class: sphx-glr-multi-img

    *

      .. image-sg:: /generated/examples/whittaker/images/sphx_glr_plot_lam_vs_data_size_004.png
         :alt: airpls
         :srcset: /generated/examples/whittaker/images/sphx_glr_plot_lam_vs_data_size_004.png
         :class: sphx-glr-multi-img

    *

      .. image-sg:: /generated/examples/whittaker/images/sphx_glr_plot_lam_vs_data_size_005.png
         :alt: arpls
         :srcset: /generated/examples/whittaker/images/sphx_glr_plot_lam_vs_data_size_005.png
         :class: sphx-glr-multi-img

    *

      .. image-sg:: /generated/examples/whittaker/images/sphx_glr_plot_lam_vs_data_size_006.png
         :alt: iarpls
         :srcset: /generated/examples/whittaker/images/sphx_glr_plot_lam_vs_data_size_006.png
         :class: sphx-glr-multi-img

    *

      .. image-sg:: /generated/examples/whittaker/images/sphx_glr_plot_lam_vs_data_size_007.png
         :alt: drpls
         :srcset: /generated/examples/whittaker/images/sphx_glr_plot_lam_vs_data_size_007.png
         :class: sphx-glr-multi-img

    *

      .. image-sg:: /generated/examples/whittaker/images/sphx_glr_plot_lam_vs_data_size_008.png
         :alt: aspls
         :srcset: /generated/examples/whittaker/images/sphx_glr_plot_lam_vs_data_size_008.png
         :class: sphx-glr-multi-img

    *

      .. image-sg:: /generated/examples/whittaker/images/sphx_glr_plot_lam_vs_data_size_009.png
         :alt: psalsa
         :srcset: /generated/examples/whittaker/images/sphx_glr_plot_lam_vs_data_size_009.png
         :class: sphx-glr-multi-img

    *

      .. image-sg:: /generated/examples/whittaker/images/sphx_glr_plot_lam_vs_data_size_010.png
         :alt: derpsalsa
         :srcset: /generated/examples/whittaker/images/sphx_glr_plot_lam_vs_data_size_010.png
         :class: sphx-glr-multi-img

    *

      .. image-sg:: /generated/examples/whittaker/images/sphx_glr_plot_lam_vs_data_size_011.png
         :alt: brpls
         :srcset: /generated/examples/whittaker/images/sphx_glr_plot_lam_vs_data_size_011.png
         :class: sphx-glr-multi-img

    *

      .. image-sg:: /generated/examples/whittaker/images/sphx_glr_plot_lam_vs_data_size_012.png
         :alt: lsrpls
         :srcset: /generated/examples/whittaker/images/sphx_glr_plot_lam_vs_data_size_012.png
         :class: sphx-glr-multi-img


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Function, baseline type, intercept & slope of log(N) vs log(lam) fit
    ------------------------------------------------------------
    asls        exponential   [-5.86444476  3.96627881]
    asls        gaussian      [-6.26444476  3.96627881]
    asls        sine          [-7.20791573  3.99298647]
    iasls       exponential   [-6.95319771  4.0821202 ]
    iasls       gaussian      [-6.86444476  3.96627881]
    iasls       sine          [-8.15319771  4.0821202 ]
    airpls      exponential   [-5.85328814  4.08214604]
    airpls      gaussian      [-5.17038765  4.0108367 ]
    airpls      sine          [-6.03111749  3.96628054]
    arpls       exponential   [-7.39857346  4.17128073]
    arpls       gaussian      [-7.46113917  4.18915776]
    arpls       sine          [-8.4685327   3.94839806]
    iarpls      exponential   [-7.03210709  3.82369727]
    iarpls      gaussian      [-7.91587793  4.10002995]
    iarpls      sine          [-7.38404418  3.46708566]
    drpls       exponential   [-7.1986639   4.17130657]
    drpls       gaussian      [-6.53722308  4.01088492]
    drpls       sine          [-8.7677514   4.09104086]
    aspls       exponential   [-5.59455688  4.18918187]
    aspls       gaussian      [-5.72939553  4.2515206 ]
    aspls       sine          [-7.08812178  4.14448334]
    psalsa      exponential   [-6.48060361  4.03756673]
    psalsa      gaussian      [-6.6635041   4.10887608]
    psalsa      sine          [-7.82258729  4.0019408 ]
    derpsalsa   exponential   [-9.26767336  4.09101856]
    derpsalsa   gaussian      [-8.7097234   4.05541159]
    derpsalsa   sine          [-9.65235478  3.87710469]
    brpls       exponential   [-6.34654627  3.8325852 ]
    brpls       gaussian      [-7.61261808  4.32292197]
    brpls       sine          [-7.59018989  3.85934219]
    lsrpls      exponential   [-6.77899512  3.97519852]
    lsrpls      gaussian      [-7.11576341  4.09999723]
    lsrpls      sine          [-8.04636204  3.83253257]


.. GENERATED FROM PYTHON SOURCE LINES 148-156

To further analyze the relationship of `lam` and data size, the best
fit lines for all algorithms for each baseline type are shown below.
Interestingly, for each baseline type, the slopes of the `lam` vs data size
lines are approximately the same for all algorithms; only the intercept
is different. This makes sense since all functions use very similar linear
equations for solving for the baseline; thus, while the optimal `lam` values
may differ between the algorithms, the relationship between `lam` and the data
size should be similar for all of them.

.. GENERATED FROM PYTHON SOURCE LINES 156-168

.. code-block:: Python


    for bkg_type in bkg_types:
        plt.figure()
        for func_name, line in bkg_dict[bkg_type].items():
            plt.plot(num_points, line, label=func_name)
        plt.legend()
        plt.loglog()
        plt.xlabel('Input Array Size, N')
        plt.ylabel('Optimal lam Value')
        plt.title(f'{bkg_type} baseline')

    plt.show()


.. rst-class:: sphx-glr-horizontal


    *

      .. image-sg:: /generated/examples/whittaker/images/sphx_glr_plot_lam_vs_data_size_013.png
         :alt: exponential baseline
         :srcset: /generated/examples/whittaker/images/sphx_glr_plot_lam_vs_data_size_013.png
         :class: sphx-glr-multi-img

    *

      .. image-sg:: /generated/examples/whittaker/images/sphx_glr_plot_lam_vs_data_size_014.png
         :alt: gaussian baseline
         :srcset: /generated/examples/whittaker/images/sphx_glr_plot_lam_vs_data_size_014.png
         :class: sphx-glr-multi-img

    *

      .. image-sg:: /generated/examples/whittaker/images/sphx_glr_plot_lam_vs_data_size_015.png
         :alt: sine baseline
         :srcset: /generated/examples/whittaker/images/sphx_glr_plot_lam_vs_data_size_015.png
         :class: sphx-glr-multi-img


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** (0 minutes 18.158 seconds)


.. _sphx_glr_download_generated_examples_whittaker_plot_lam_vs_data_size.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: plot_lam_vs_data_size.ipynb <plot_lam_vs_data_size.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: plot_lam_vs_data_size.py <plot_lam_vs_data_size.py>`

    .. container:: sphx-glr-download sphx-glr-download-zip

      :download:`Download zipped: plot_lam_vs_data_size.zip <plot_lam_vs_data_size.zip>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_