.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "auto_examples/deep/plot_deepdadataset.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_auto_examples_deep_plot_deepdadataset.py>`
        to download the full example code.

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_examples_deep_plot_deepdadataset.py:


Deep Domain Aware Datasets
==========================

This example illustrate some uses of DeepDADatasets.

.. GENERATED FROM PYTHON SOURCE LINES 7-11

.. code-block:: Python

    # Author: Maxence Barneche
    #
    # License: BSD 3-Clause
    # sphinx_gallery_thumbnail_number = 4


.. GENERATED FROM PYTHON SOURCE LINES 12-20

.. code-block:: Python


    import numpy as np
    import pandas as pd
    import torch

    from skada.datasets import make_shifted_datasets
    from skada.deep.base import DeepDADataset


.. GENERATED FROM PYTHON SOURCE LINES 21-49

Creation
--------
Deep domain aware datasets are a unified representation of data for deep
methods in skada.

In those datasets, a data sample has four (optionally, five) attributes:
  * the data point :code:`X`
  * the label :code:`y`
  * the domain :code:`sample_domain`
  * optionally, the weight :code:`sample_weight`
  * the sample index :code:`sample_idx` (automatically generated), which is
      the index of the sample in the dataset, relative to its domain.

Note that the data is not shuffled, so the order of the samples is preserved.

.. WARNING::
  In a dataset, either all data samples have a weight, or none of them.
  On the other hand, it is possible that a sample has no associated label or domain.
  In that case, it will be associated to label :code:`-1` and domain :code:`0`.

DeepDADatasets can be created from numpy arrays, torch tensors, lists,
tuples, or dictionary of one of the former.

If a dictionary is provided, it must contain the keys :code:`X`, :code:`y`(optional),
:code:`sample_domain`(optional) and :code:`sample_weight`(optional).

If both dictionary and positional arguments are provided, the dictionary
arguments will take precedence over the positional ones.

.. GENERATED FROM PYTHON SOURCE LINES 49-73

.. code-block:: Python


    # practice dataset as numpy arrays
    raw_data = make_shifted_datasets(20, 20, random_state=42)
    X, y, sample_domain = raw_data
    # though these are not technically weights, they will act as such throughout the guide.
    weights = np.ones_like(y)
    dict_raw_data = {"X": X, "sample_domain": sample_domain, "y": y}
    weighted_dict_raw_data = {
        "X": X,
        "sample_domain": sample_domain,
        "y": y,
        "sample_weight": weights,
    }

    dataset = DeepDADataset(X, y, sample_domain)
    dataset_from_dict = DeepDADataset(dict_raw_data)
    # it is possible to add weights to the dataset, either at creation or later
    dataset_with_weights = DeepDADataset(X, y, sample_domain, weights)
    dataset_with_weights_from_dict = DeepDADataset(weighted_dict_raw_data)

    # these methods change the dataset in place and return the dataset itself
    dataset = dataset.add_weights(weights)
    dataset = dataset.remove_weights()


.. GENERATED FROM PYTHON SOURCE LINES 74-82

It is also possible to create a DeepDADataset from lists, tuples, tensors,
pandas dataframes or any combination of those.

.. note::
  Just like for the dictionary, if a pandas dataframe is provided it must
  contain the keys :code:`X`, :code:`y` (optional), :code:`sample_domain`(optional)
  and :code:`sample_weight` (optional).
  Also, the data in the dataframe will take precedence over the positional arguments.

.. GENERATED FROM PYTHON SOURCE LINES 82-106

.. code-block:: Python


    # from lists
    dataset_from_list = DeepDADataset(X.tolist(), y.tolist(), sample_domain.tolist())
    # from tuples
    dataset_from_tuple = DeepDADataset(
        tuple(X.tolist()), tuple(y.tolist()), tuple(sample_domain.tolist())
    )

    # from torch tensors
    dataset_from_tensor = DeepDADataset(
        torch.tensor(X), torch.tensor(y), torch.tensor(sample_domain)
    )

    # from pandas dataframe of same structure as the dictionary
    df = pd.DataFrame(
        {
            "X": list(X),
            "y": y,
            "sample_domain": sample_domain,
            "sample_weight": weights,
        }
    )
    dataset_from_df = DeepDADataset(df)


.. GENERATED FROM PYTHON SOURCE LINES 107-109

It is also possible to merge two datasets, which will concatenate the data
samples, the labels and the domains.

.. GENERATED FROM PYTHON SOURCE LINES 109-111

.. code-block:: Python

    dataset2 = dataset.merge(dataset)


.. GENERATED FROM PYTHON SOURCE LINES 112-126

Accessing data
----------------

The data can be accessed with the same indexing methods as for a torch tensor.
The returned data is a tuple with a dictionary with the keys :code:`X`,
:code:`sample_domain`, :code:`sample_idx`, and optionally :code:`sample_weight`
as first element and the corresponding label :code:`y` as second element.

..note::
  The data is stored in torch tensors, with dimension 1 for :code:`sample_domain`,
  :code:`y` and :code:`sample_weight`.

It is also possible to access the data through the various selection methods,
all of which return DeepDADatasets instances.

.. GENERATED FROM PYTHON SOURCE LINES 126-136

.. code-block:: Python


    # indexing methods return a tuple with the data as dict and the label
    first_sample = dataset[0]  # first sample
    first_five_samples = dataset[0:5]  # first five samples

    # selecting methods return a DeepDADataset with the selected samples
    domain_1_samples = dataset.select_domain(1)  # all samples from domain 1
    label_1_samples = dataset.select(
        lambda label: label == 1, on="y"
    )  # all samples with label 1


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** (0 minutes 0.012 seconds)


.. _sphx_glr_download_auto_examples_deep_plot_deepdadataset.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: plot_deepdadataset.ipynb <plot_deepdadataset.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: plot_deepdadataset.py <plot_deepdadataset.py>`

    .. container:: sphx-glr-download sphx-glr-download-zip

      :download:`Download zipped: plot_deepdadataset.zip <plot_deepdadataset.zip>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_