.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "auto_examples/methods/plot_reweighting.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_auto_examples_methods_plot_reweighting.py>`
        to download the full example code.

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_examples_methods_plot_reweighting.py:


Reweighting method example on covariate shift dataset
====================================================

An example of the reweighting methods on a dataset subject
to covariate shift

.. GENERATED FROM PYTHON SOURCE LINES 8-16

.. code-block:: Python


    # Author:   Ruben Bueno <ruben.bueno@polytechnique.edu>
    #           Antoine de Mathelin
    #           Oleksii Kachaiev <kachayev@gmail.com>
    #
    # License: BSD 3-Clause
    # sphinx_gallery_thumbnail_number = 7


.. GENERATED FROM PYTHON SOURCE LINES 17-36

.. code-block:: Python

    import matplotlib.pyplot as plt
    import numpy as np
    from matplotlib.colors import ListedColormap
    from sklearn.inspection import DecisionBoundaryDisplay
    from sklearn.linear_model import LogisticRegression
    from sklearn.neighbors import KernelDensity

    from skada import (
        DensityReweight,
        DiscriminatorReweight,
        GaussianReweight,
        KLIEPReweight,
        KMMReweight,
        NearestNeighborReweight,
        source_target_split,
    )
    from skada.datasets import make_shifted_datasets
    from skada.utils import extract_source_indices


.. GENERATED FROM PYTHON SOURCE LINES 37-60

Reweighting Methods
------------------------------------------
The purpose of reweighting methods is to estimate weights for the source dataset.
These weights are then used to fit an estimator on the source dataset, taking the
weights into account. The goal is to ensure that the fitted estimator is suitable
for predicting labels from the target distribution.

Reweighting methods implemented and illustrated are the following:
  * :ref:`Density Reweighting<Illustration of the Density Reweighting method>`
  * :ref:`Gaussian Reweighting<Illustration of the Gaussian reweighting method>`
  * :ref:`Discr. Reweighting<Illustration of the Discr. reweighting method>`
  * :ref:`KLIEPReweight<Illustration of the KLIEPReweight method>`
  * :ref:`Nearest Neighbor reweighting<Illustration of the Nearest Neighbor
    reweighting method>`
  * :ref:`Kernel Mean Matching<Illustration of the Kernel Mean Matching method>`

For more details, look at [3].

.. [3] [Sugiyama et al., 2008] Sugiyama, M., Suzuki, T., Nakajima, S., Kashima, H.,
       von Bünau, P., and Kawanabe, M. (2008). Direct importance estimation for
       covariate shift adaptation. Annals of the Institute of Statistical
       Mathematics, 60(4):699–746.
       https://www.ism.ac.jp/editsec/aism/pdf/060_4_0699.pdf

.. GENERATED FROM PYTHON SOURCE LINES 61-67

.. code-block:: Python


    base_classifier = LogisticRegression().set_fit_request(sample_weight=True)

    print(f"Will be using {base_classifier} as base classifier", end="\n\n")


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Will be using LogisticRegression() as base classifier


.. GENERATED FROM PYTHON SOURCE LINES 68-72

We generate our 2D dataset with 2 classes
------------------------------------------

We generate a simple 2D dataset with covariate shift

.. GENERATED FROM PYTHON SOURCE LINES 72-81

.. code-block:: Python


    RANDOM_SEED = 42

    X, y, sample_domain = make_shifted_datasets(
        n_samples_source=20, n_samples_target=20, noise=0.1, random_state=RANDOM_SEED
    )

    Xs, Xt, ys, yt = source_target_split(X, y, sample_domain=sample_domain)


.. GENERATED FROM PYTHON SOURCE LINES 82-84

Plot of the dataset:
------------------------------------------

.. GENERATED FROM PYTHON SOURCE LINES 84-112

.. code-block:: Python


    x_min, x_max = -2.5, 4.5
    y_min, y_max = -1.5, 4.5


    figsize = (8, 4)
    figure, axes = plt.subplots(1, 2, figsize=figsize)

    cm = plt.cm.RdBu
    colormap = ListedColormap(["#FFA056", "#6C4C7C"])
    ax = axes[0]
    ax.set_title("Source data")
    # Plot the source points:
    ax.scatter(Xs[:, 0], Xs[:, 1], c=ys, cmap=colormap, alpha=0.7, s=[25])

    ax.set_xticks(()), ax.set_yticks(())
    ax.set_xlim(x_min, x_max), ax.set_ylim(y_min, y_max)

    ax = axes[1]

    ax.set_title("Target data")
    # Plot the target points:
    ax.scatter(Xt[:, 0], Xt[:, 1], c=ys, cmap=colormap, alpha=0.1, s=[25])
    ax.scatter(Xt[:, 0], Xt[:, 1], c=yt, cmap=colormap, alpha=0.7, s=[25])
    figure.suptitle("Plot of the dataset", fontsize=16, y=1)
    ax.set_xticks(()), ax.set_yticks(())
    ax.set_xlim(x_min, x_max), ax.set_ylim(y_min, y_max)


.. image-sg:: /auto_examples/methods/images/sphx_glr_plot_reweighting_001.png
   :alt: Plot of the dataset, Source data, Target data
   :srcset: /auto_examples/methods/images/sphx_glr_plot_reweighting_001.png
   :class: sphx-glr-single-img


.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    ((-2.5, 4.5), (-1.5, 4.5))


.. GENERATED FROM PYTHON SOURCE LINES 113-119

Illustration of the problem with no domain adaptation
------------------------------------------

When not using domain adaptation, the classifier won't train on
data that is distributed as the target sample domain, it will thus
not be performing optimaly.

.. GENERATED FROM PYTHON SOURCE LINES 119-204

.. code-block:: Python


    # We create a dict to store scores:
    scores_dict = {}


    def plot_weights_and_classifier(
        clf,
        weights,
        name="Without DA",
        suptitle=None,
    ):
        if suptitle is None:
            suptitle = f"Illustration of the {name} method"
        figure, axes = plt.subplots(1, 2, figsize=figsize)
        ax = axes[1]
        score = clf.score(Xt, yt)
        DecisionBoundaryDisplay.from_estimator(
            clf,
            Xs,
            cmap=ListedColormap(["w", "k"]),
            alpha=1,
            ax=ax,
            eps=0.5,
            response_method="predict",
            plot_method="contour",
        )

        size = 5 + 10 * weights

        # Plot the target points:
        ax.scatter(
            Xt[:, 0],
            Xt[:, 1],
            c=yt,
            cmap=colormap,
            alpha=0.7,
            s=[25],
        )

        ax.set_xticks(()), ax.set_yticks(())
        ax.set_xlim(x_min, x_max), ax.set_ylim(y_min, y_max)
        ax.set_title("Accuracy on target", fontsize=12)
        ax.text(
            x_max - 0.3,
            y_min + 0.3,
            ("%.2f" % score).lstrip("0"),
            size=15,
            horizontalalignment="right",
        )
        scores_dict[name] = score

        ax = axes[0]

        # Plot the source points:
        ax.scatter(Xs[:, 0], Xs[:, 1], c=ys, cmap=colormap, alpha=0.7, s=size)

        DecisionBoundaryDisplay.from_estimator(
            clf,
            Xs,
            cmap=ListedColormap(["w", "k"]),
            alpha=1,
            ax=ax,
            eps=0.5,
            response_method="predict",
            plot_method="contour",
        )

        ax.set_xticks(()), ax.set_yticks(())
        ax.set_xlim(x_min, x_max), ax.set_ylim(y_min, y_max)
        if name != "Without DA":
            ax.set_title("Training with reweighted data", fontsize=12)
        else:
            ax.set_title("Training data", fontsize=12)
        figure.suptitle(suptitle, fontsize=16, y=1)


    clf = base_classifier
    clf.fit(Xs, ys)
    plot_weights_and_classifier(
        base_classifier,
        name="Without DA",
        weights=np.array([2] * Xs.shape[0]),
        suptitle="Illustration of the classifier with no DA",
    )


.. image-sg:: /auto_examples/methods/images/sphx_glr_plot_reweighting_002.png
   :alt: Illustration of the classifier with no DA, Training data, Accuracy on target
   :srcset: /auto_examples/methods/images/sphx_glr_plot_reweighting_002.png
   :class: sphx-glr-single-img


.. GENERATED FROM PYTHON SOURCE LINES 205-210

Illustration of the Density Reweighting method
------------------------------------------

This method is trying to compute the optimal weights as a ratio of two probability
functions, by default, it is the ratio of two kernel densities estimations.

.. GENERATED FROM PYTHON SOURCE LINES 210-227

.. code-block:: Python


    # We define our classifier, `clf` is a da pipeline
    clf = DensityReweight(
        base_estimator=base_classifier,
        weight_estimator=KernelDensity(bandwidth=0.5),
    )
    clf.fit(X, y, sample_domain=sample_domain)

    # We get the weights:

    # we first get the adapter which is estimating the weights
    weight_estimator = clf[0].get_estimator()
    idx = extract_source_indices(sample_domain)
    weights = weight_estimator.compute_weights(X, sample_domain=sample_domain)[idx]

    plot_weights_and_classifier(clf, weights=weights, name="Density Reweighting")


.. image-sg:: /auto_examples/methods/images/sphx_glr_plot_reweighting_003.png
   :alt: Illustration of the Density Reweighting method, Training with reweighted data, Accuracy on target
   :srcset: /auto_examples/methods/images/sphx_glr_plot_reweighting_003.png
   :class: sphx-glr-single-img


.. GENERATED FROM PYTHON SOURCE LINES 228-239

Illustration of the Gaussian reweighting method
------------------------------------------
This method tries to approximate the optimal weights by assuming that the data are
normally distributed, and thus approximating the probability functions for both source
and target set, and setting the weight to be the ratio of the two.

See [1] for details.

.. [1]  Hidetoshi Shimodaira. Improving predictive inference under
        covariate shift by weighting the log-likelihood function.
        In Journal of Statistical Planning and Inference, 2000.

.. GENERATED FROM PYTHON SOURCE LINES 239-250

.. code-block:: Python


    # We define our classifier, `clf` is a da pipeline
    clf = GaussianReweight(base_classifier)
    clf.fit(X, y, sample_domain=sample_domain)
    # We get the weights
    weight_estimator = clf[0].get_estimator()
    idx = extract_source_indices(sample_domain)
    weights = weight_estimator.compute_weights(X, sample_domain=sample_domain)[idx]

    plot_weights_and_classifier(clf, weights=weights, name="Gaussian Reweighting")


.. image-sg:: /auto_examples/methods/images/sphx_glr_plot_reweighting_004.png
   :alt: Illustration of the Gaussian Reweighting method, Training with reweighted data, Accuracy on target
   :srcset: /auto_examples/methods/images/sphx_glr_plot_reweighting_004.png
   :class: sphx-glr-single-img


.. GENERATED FROM PYTHON SOURCE LINES 251-263

Illustration of the Discr. reweighting method
------------------------------------------

This estimator derive a class of predictive densities by weighting the source samples
when trying to maximize the log-likelihood function. Such approach is effective in
cases of covariate shift.

See [1] for details.

.. [1]    Hidetoshi Shimodaira. Improving predictive inference under
          covariate shift by weighting the log-likelihood function.
          In Journal of Statistical Planning and Inference, 2000.

.. GENERATED FROM PYTHON SOURCE LINES 263-277

.. code-block:: Python


    # We define our classifier, `clf` is a da pipeline
    clf = DiscriminatorReweight(base_classifier)
    clf.fit(X, y, sample_domain=sample_domain)

    # We get the weights:

    # we first get the adapter which is estimating the weights
    weight_estimator = clf[0].get_estimator()
    idx = extract_source_indices(sample_domain)
    weights = weight_estimator.compute_weights(X, sample_domain=sample_domain)[idx]

    plot_weights_and_classifier(clf, weights=weights, name="Discr. Reweighting")


.. image-sg:: /auto_examples/methods/images/sphx_glr_plot_reweighting_005.png
   :alt: Illustration of the Discr. Reweighting method, Training with reweighted data, Accuracy on target
   :srcset: /auto_examples/methods/images/sphx_glr_plot_reweighting_005.png
   :class: sphx-glr-single-img


.. GENERATED FROM PYTHON SOURCE LINES 278-291

Illustration of the KLIEPReweight method
------------------------------------------

The idea of KLIEPReweight is to find an importance estimate :math:`w(x)` such that
the Kullback-Leibler (KL) divergence between the source input density
:math:`p_{source}(x)` to its estimate :math:`p_{target}(x) = w(x)p_{source}(x)`
is minimized.

See [3] for details.

.. [3] Masashi Sugiyama et. al. Direct Importance Estimation with Model Selection
       and Its Application to Covariate Shift Adaptation.
       In NeurIPS, 2007.

.. GENERATED FROM PYTHON SOURCE LINES 291-307

.. code-block:: Python


    # We define our classifier, `clf` is a da pipeline
    clf = KLIEPReweight(
        LogisticRegression().set_fit_request(sample_weight=True), gamma=[1, 0.1, 0.001]
    )
    clf.fit(X, y, sample_domain=sample_domain)

    # We get the weights:

    # we first get the adapter which is estimating the weights
    weight_estimator = clf[0].get_estimator()
    idx = extract_source_indices(sample_domain)
    weights = weight_estimator.compute_weights(X, sample_domain=sample_domain)[idx]

    plot_weights_and_classifier(clf, weights=weights, name="KLIEPReweight")


.. image-sg:: /auto_examples/methods/images/sphx_glr_plot_reweighting_006.png
   :alt: Illustration of the KLIEPReweight method, Training with reweighted data, Accuracy on target
   :srcset: /auto_examples/methods/images/sphx_glr_plot_reweighting_006.png
   :class: sphx-glr-single-img


.. GENERATED FROM PYTHON SOURCE LINES 308-322

Illustration of the Nearest Neighbor reweighting method
------------------------------------------
.. _Nearest Neighbor reweighting

This method estimate weight of a point in the source dataset by
counting the number of points in the target set that are closer to
it than any other points from the source dataset.

See [24] for details.

.. [24] Loog, M. (2012).
       Nearest neighbor-based importance weighting.
       In 2012 IEEE International Workshop on Machine
       Learning for Signal Processing, pages 1–6. IEEE

.. GENERATED FROM PYTHON SOURCE LINES 322-336

.. code-block:: Python


    # We define our classifier, `clf` is a da pipeline
    clf = NearestNeighborReweight(base_classifier, laplace_smoothing=True)
    clf.fit(X, y, sample_domain=sample_domain)

    # We get the weights:

    # we first get the adapter which is estimating the weights
    weight_estimator = clf[0].get_estimator()
    idx = extract_source_indices(sample_domain)
    weights = weight_estimator.compute_weights(X, sample_domain=sample_domain)[idx]

    plot_weights_and_classifier(clf, weights=weights, name="1NN Reweighting")


.. image-sg:: /auto_examples/methods/images/sphx_glr_plot_reweighting_007.png
   :alt: Illustration of the 1NN Reweighting method, Training with reweighted data, Accuracy on target
   :srcset: /auto_examples/methods/images/sphx_glr_plot_reweighting_007.png
   :class: sphx-glr-single-img


.. GENERATED FROM PYTHON SOURCE LINES 337-349

Illustration of the Kernel Mean Matching method
------------------------------------------
.. _Kernel Mean Matching

This example illustrates the use of KMMReweight method [6] to correct covariate-shift.
This methods works without any estimation of the assumption, by matching distribution
between training and testing sets in feature space.

See [25] for details.

.. [25] J. Huang, A. Gretton, K. Borgwardt, B. Schölkopf and A. J. Smola.
       Correcting sample selection bias by unlabeled data. In NIPS, 2007.

.. GENERATED FROM PYTHON SOURCE LINES 349-410

.. code-block:: Python


    # We define our classifier, `clf` is a da pipeline
    clf = KMMReweight(base_classifier, gamma=10.0, max_iter=1000, smooth_weights=False)
    clf.fit(X, y, sample_domain=sample_domain)

    # We get the weights:

    # we first get the adapter which is estimating the weights
    weight_estimator = clf[0].get_estimator()
    idx = extract_source_indices(sample_domain)
    weights = weight_estimator.compute_weights(X, sample_domain=sample_domain)[idx]

    plot_weights_and_classifier(
        clf,
        weights=weights,
        name="Kernel Mean Matching",
        suptitle="Illustration of KMMReweight without weights smoothing",
    )

    # We define our classifier, `clf` is a da pipeline
    clf = KMMReweight(base_classifier, gamma=10.0, max_iter=1000, smooth_weights=True)
    clf.fit(X, y, sample_domain=sample_domain)

    # We get the weights:

    # we first get the adapter which is estimating the weights
    weight_estimator = clf[0].get_estimator()
    idx = extract_source_indices(sample_domain)
    weights = weight_estimator.compute_weights(X, sample_domain=sample_domain)[idx]

    plot_weights_and_classifier(
        clf,
        weights=weights,
        name="Kernel Mean Matching",
        suptitle="Illustration of KMMReweight with weights smoothing",
    )

    # We define our classifier, `clf` is a da pipeline
    clf = KMMReweight(
        base_classifier,
        gamma=10.0,
        max_iter=1000,
        smooth_weights=True,
        solver="frank-wolfe",
    )
    clf.fit(X, y, sample_domain=sample_domain)

    # We get the weights:

    # we first get the adapter which is estimating the weights
    weight_estimator = clf[0].get_estimator()
    idx = extract_source_indices(sample_domain)
    weights = weight_estimator.compute_weights(X, sample_domain=sample_domain)[idx]

    plot_weights_and_classifier(
        clf,
        weights=weights,
        name="Kernel Mean Matching",
        suptitle="Illustration of KMMReweight with Frank-Wolfe solver",
    )


.. rst-class:: sphx-glr-horizontal


    *

      .. image-sg:: /auto_examples/methods/images/sphx_glr_plot_reweighting_008.png
         :alt: Illustration of KMMReweight without weights smoothing, Training with reweighted data, Accuracy on target
         :srcset: /auto_examples/methods/images/sphx_glr_plot_reweighting_008.png
         :class: sphx-glr-multi-img

    *

      .. image-sg:: /auto_examples/methods/images/sphx_glr_plot_reweighting_009.png
         :alt: Illustration of KMMReweight with weights smoothing, Training with reweighted data, Accuracy on target
         :srcset: /auto_examples/methods/images/sphx_glr_plot_reweighting_009.png
         :class: sphx-glr-multi-img

    *

      .. image-sg:: /auto_examples/methods/images/sphx_glr_plot_reweighting_010.png
         :alt: Illustration of KMMReweight with Frank-Wolfe solver, Training with reweighted data, Accuracy on target
         :srcset: /auto_examples/methods/images/sphx_glr_plot_reweighting_010.png
         :class: sphx-glr-multi-img


.. GENERATED FROM PYTHON SOURCE LINES 411-413

Comparison of score between reweighting methods:
------------------------------------------

.. GENERATED FROM PYTHON SOURCE LINES 413-425

.. code-block:: Python


    def print_scores_as_table(scores):
        max_len = max(len(k) for k in scores.keys())
        for k, v in scores.items():
            print(f"{k}{' '*(max_len - len(k))} | ", end="")
            print(f"{v*100}{' '*(6-len(str(v*100)))}%")


    print_scores_as_table(scores_dict)

    plt.show()


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Without DA           | 80.625%
    Density Reweighting  | 100.0 %
    Gaussian Reweighting | 99.375%
    Discr. Reweighting   | 98.125%
    KLIEPReweight        | 100.0 %
    1NN Reweighting      | 99.375%
    Kernel Mean Matching | 100.0 %


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** (0 minutes 1.130 seconds)


.. _sphx_glr_download_auto_examples_methods_plot_reweighting.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: plot_reweighting.ipynb <plot_reweighting.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: plot_reweighting.py <plot_reweighting.py>`

    .. container:: sphx-glr-download sphx-glr-download-zip

      :download:`Download zipped: plot_reweighting.zip <plot_reweighting.zip>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_