.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "generated/examples/spline/plot_pspline_whittaker.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_generated_examples_spline_plot_pspline_whittaker.py>`
        to download the full example code.

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_generated_examples_spline_plot_pspline_whittaker.py:


P-spline versions of Whittaker functions
----------------------------------------

pybaselines contains penalized spline (P-spline) versions of all of the
Whittaker-smoothing-based algorithms implemented in pybaselines. The reason
for doing so was that P-splines offer additional user flexibility when choosing
parameters for fitting and more easily work for unequally spaced data. This example
will examine the relationship of `lam` versus the number of data points when fitting
a baseline with the :meth:`~.Baseline.arpls` function and its P-spline version,
:meth:`~.Baseline.pspline_arpls`.

Note that the exact optimal `lam` values reported in this example are not of significant
use since they depend on many other factors such as the baseline curvature, noise, peaks,
etc.; however, the examined trends can be used to simplify the process of selecting `lam`
values for fitting new datasets.

.. GENERATED FROM PYTHON SOURCE LINES 20-72

.. code-block:: Python


    from itertools import cycle
    import warnings

    import matplotlib.pyplot as plt
    import numpy as np

    from pybaselines import Baseline, utils


    def _minimize(data, known_baseline, func, lams, **kwargs):
        """Finds the `lam` value that minimizes the L2 norm."""
        min_error = np.inf
        best_lam = lams[0]
        with warnings.catch_warnings():
            # ignore warnings that occur when using very small lam values
            warnings.filterwarnings('ignore')
            for lam in lams:
                try:
                    baseline = func(data, lam=10**lam, **kwargs)[0]
                except np.linalg.LinAlgError:
                    continue  # numerical instability can occur for lam >~1e12, just ignore
                fit_error = np.linalg.norm(known_baseline - baseline, 2)
                if fit_error < min_error:
                    min_error = fit_error
                    best_lam = lam

        return best_lam


    def optimize_lam(data, known_baseline, func, previous_min=None, **kwargs):
        """
        Finds the optimum `lam` value.

        The optimal `lam` value should be ``10**best_lam``. Could alternatively
        use scipy.optimize.fmin, but this simple version is enough for examples.

        """
        if previous_min is None:
            min_lam = -1
        else:
            min_lam = previous_min - 0.5
        # coarse optimization
        lams = np.arange(min_lam, 13.5, 0.5)
        best_lam = _minimize(data, known_baseline, func, lams, **kwargs)
        # fine optimization
        lams = np.arange(best_lam - 0.5, best_lam + 0.7, 0.2)
        best_lam = _minimize(data, known_baseline, func, lams, **kwargs)

        return best_lam


.. GENERATED FROM PYTHON SOURCE LINES 74-78

The baseline for this example is an exponentially decaying baseline, shown below.
Other baseline types could be examined, similar to the
:ref:`Whittaker lam vs data size example <sphx_glr_generated_examples_whittaker_plot_lam_vs_data_size.py>`,
which should give similar results.

.. GENERATED FROM PYTHON SOURCE LINES 78-80

.. code-block:: Python

    plt.plot(utils._make_data(1000, bkg_type='exponential')[1])


.. image-sg:: /generated/examples/spline/images/sphx_glr_plot_pspline_whittaker_001.png
   :alt: plot pspline whittaker
   :srcset: /generated/examples/spline/images/sphx_glr_plot_pspline_whittaker_001.png
   :class: sphx-glr-single-img


.. GENERATED FROM PYTHON SOURCE LINES 81-87

For each function, the optimal `lam` value will be calculated for data sizes
ranging from 500 to 20000 points. Further, the intercept and slope of the linear fit
of the log of the data size, N, and the log of the `lam` value will be reported.
The number of knots for the P-spline version is fixed at the default, 100 (the effect
of the number of knots versus optimal `lam` is shown in another
:ref:`example <sphx_glr_generated_examples_spline_plot_lam_vs_num_knots.py>`).

.. GENERATED FROM PYTHON SOURCE LINES 87-129

.. code-block:: Python

    print('Function, intercept & slope of log(N) vs log(lam) fit')
    print('-' * 60)

    show_plots = False  # for debugging
    num_points = np.logspace(np.log10(500), np.log10(20000), 6, dtype=int)
    symbols = cycle(['o', 's'])
    _, ax = plt.subplots()
    legend = [[], []]
    for i, func_name in enumerate(('arpls', 'pspline_arpls')):
        best_lams = np.empty_like(num_points, float)
        min_lam = None
        for j, num_x in enumerate(num_points):
            func = getattr(Baseline(), func_name)
            x, y, baseline = utils._make_data(num_x, bkg_type='exponential')
            # use a slightly lower tolerance to speed up the calculation
            min_lam = optimize_lam(y, baseline, func, min_lam, tol=1e-2, max_iter=50)
            best_lams[j] = min_lam

            if show_plots:
                plt.figure(num=num_x)
                if i == 0:
                    plt.plot(y)
                plt.plot(baseline)
                plt.plot(func(y, lam=10**min_lam)[0], '--')

        fit = np.polynomial.polynomial.Polynomial.fit(np.log10(num_points), best_lams, 1)
        coeffs = fit.convert().coef
        print(f'{func_name:<16} {coeffs}')
        line = 10**fit(np.log10(num_points))

        handle_1 = ax.plot(num_points, line, label=func_name)[0]
        handle_2 = ax.plot(num_points, 10**best_lams, next(symbols))[0]
        legend[0].append((handle_1, handle_2))
        legend[1].append(func_name)

    ax.loglog()
    ax.legend(*legend)
    ax.set_xlabel('Input Array Size, N')
    ax.set_ylabel('Optimal lam Value')

    plt.show()


.. image-sg:: /generated/examples/spline/images/sphx_glr_plot_pspline_whittaker_002.png
   :alt: plot pspline whittaker
   :srcset: /generated/examples/spline/images/sphx_glr_plot_pspline_whittaker_002.png
   :class: sphx-glr-single-img


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Function, intercept & slope of log(N) vs log(lam) fit
    ------------------------------------------------------------
    arpls            [-7.39857346  4.17128073]
    pspline_arpls    [-1.18699822  1.09635347]


.. GENERATED FROM PYTHON SOURCE LINES 130-150

The results shown above demonstrate that the slope of the `lam` vs data
size best fit line is much smaller for the P-spline based version of arpls.
This means that once the number of knots is fixed for a particular baseline,
the required `lam` value should be much less affected by a change in the
number of data points (assuming the curvature of the data does not change).

The above results are particularly useful when processing very large datasets.
A `lam` value greater than ~1e14 typically causes numerical issues that can cause
the solver to fail. Most Whittaker-smoothing-based algorithms reach that `lam`
cutoff when the number of points is around ~20,000-500,000 (depends on the exact
algorithm). Since the P-spline versions do not experience such a large increase in
the required `lam`, they are more suited to fit those larger datasets. Additionally,
the required `lam` value for the P-spline versions can be lowered simply by reducing
the number of knots.

It should be addressed that a similar result could be obtained using the regular
Whittaker-smoothing-based version by truncating the number of points to a fixed
value. That, however, would require additional processing steps to smooth out the
resulting baseline after interpolating back to the original data size. Thus, the
P-spline versions require less user-intervention to achieve the same result.


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** (0 minutes 1.492 seconds)


.. _sphx_glr_download_generated_examples_spline_plot_pspline_whittaker.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: plot_pspline_whittaker.ipynb <plot_pspline_whittaker.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: plot_pspline_whittaker.py <plot_pspline_whittaker.py>`

    .. container:: sphx-glr-download sphx-glr-download-zip

      :download:`Download zipped: plot_pspline_whittaker.zip <plot_pspline_whittaker.zip>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_