skada.datasets.DomainAwareDataset

Container carrying all dataset domains.

This class allows to store and manipulate datasets from multiple domains, keeping track of the domain information for each sample.

Parameters:

domainslist of tuple or dict of tuple or None, optional: List or dictionary of domains to add at initialization. Each domain can be a tuple (X, y) or (X, y, name).

Attributes:

domains_list: List of domains added, each as a tuple (X, y) or (X,).
domain_names_dict: Dictionary mapping each domain name to its internal identifier.

add_domain(X, y=None, domain_name: str | None = None) → DomainAwareDataset[source]

Add a new domain to the dataset.

Parameters:

Xnp.ndarray: Feature matrix for the domain.
ynp.ndarray or None, optional: Labels for the domain. If None, labels are not provided.
domain_namestr, optional: Name of the domain. If None, a unique name is autogenerated.

Returns:

selfDomainAwareDataset: The updated dataset.

get_domain(domain_name: str) → Tuple[ndarray, ndarray | None][source]

Retrieve the data and labels for a given domain.

Parameters:

domain_namestr: Name of the domain to retrieve.

Returns:

domaintuple: Tuple containing (X, y) or (X,) for the specified domain.

merge(dataset: DomainAwareDataset, names_mapping: Mapping | None = None) → DomainAwareDataset[source]

Merge another DomainAwareDataset into this one.

Parameters:

datasetDomainAwareDataset: The dataset to merge.
names_mappingmapping, optional: Mapping from old domain names to new domain names.

Returns:

selfDomainAwareDataset: The updated dataset.

pack(as_sources: List[str], as_targets: List[str], mask_target_labels: bool, return_X_y: bool = True, train: bool | None = None, mask: None | int | float = None) → Bunch | Tuple[ndarray, ndarray, ndarray][source]

Aggregates datasets from all domains into a unified domain-aware representation, ensuring compatibility with domain adaptation (DA) estimators.

Parameters:

as_sourceslist: List of domain names to be used as sources. An empty list indicates that no source domains are used.
as_targetslist: List of domain names to be used as targets. An empty list indicates that no target domains are used.
mask_target_labelsbool: This parameter should be set to True for training and False for testing. When set to True, masks labels for target domains with -1 for classification tasks of nan for regression tasks, so they are not available at train time.
return_X_ybool, default=True: When set to True, returns a tuple (X, y, sample_domain). Otherwise returns Bunch object with the structure described below.
train: Optional[bool], default=None: [DEPRECATED] Use `mask_target_labels`instead.
mask: int | float (optional), default=None: Value to mask labels at training time.

Returns:

dataBunch

Dictionary-like object, with the following attributes.

X: ndarray: Samples from all sources and all targets given.
yndarray: Labels from all sources and all targets.
sample_domainndarray: The integer label for domain the sample was taken from. By convention, source domains have non-negative labels, and target domain label is always < 0.
domain_namesdict: The names of domains and associated domain labels.

(X, y, sample_domain)tuple if return_X_y=True

Tuple of (data, target, sample_domain), see the description above.

pack_lodo(return_X_y: bool = True) → Bunch | Tuple[ndarray, ndarray, ndarray][source]

Packages all domains in a format compatible with the Leave-One-Domain-Out cross-validator (refer to LeaveOneDomainOut for more details). To enable the splitter's dynamic assignment of source and target domains, data from each domain is included in the output twice — once as a source and once as a target.

Exercise caution when using this output for purposes other than its intended use, as this could lead to incorrect results and data leakage.

Parameters:

return_X_ybool, default=True: When set to True, returns a tuple (X, y, sample_domain). Otherwise returns Bunch object with the structure described below.

Returns:

dataBunch

Dictionary-like object, with the following attributes.

X: ndarray: Samples from all sources and all targets given.
yndarray: Labels from all sources and all targets.
sample_domainnp.ndarray: The integer label for domain the sample was taken from. By convention, source domains have non-negative labels, and target domain label is always < 0.
domain_namesdict: The names of domains and associated domain labels.

(X, y, sample_domain)tuple if return_X_y=True

Tuple of (data, target, sample_domain), see the description above.

pack_test(as_targets: List[str], return_X_y: bool = True) → Bunch | Tuple[ndarray, ndarray, ndarray][source]

Aggregate target domains for testing.

Warning

This method is deprecated and will be removed in future versions. Use pack() with mask_target_labels=False instead.

This method is equivalent to pack() with only target domains and train=False. Labels are not masked.

Parameters:

as_targetslist of str: List of domain names to be used as targets.
return_X_ybool, default=True: If True, returns a tuple (X, y, sample_domain). Otherwise, returns a sklearn.utils.Bunch object.

Returns:

datasklearn.utils.Bunch: Dictionary-like object with attributes X, y, sample_domain, domain_names.
(X, y, sample_domain)tuple if return_X_y=True: Tuple of (data, target, sample_domain).

pack_train(as_sources: List[str], as_targets: List[str], return_X_y: bool = True, mask: None | int | float = None) → Bunch | Tuple[ndarray, ndarray, ndarray][source]

Aggregate source and target domains for training.

Warning

This method is deprecated and will be removed in future versions. Use pack() with mask_target_labels=True instead.

This method is equivalent to pack() with train=True. It masks the labels for target domains (with -1 or a custom mask value) so that they are not available during training, as required for domain adaptation scenarios.

Parameters:

as_sourceslist of str: List of domain names to be used as sources.
as_targetslist of str: List of domain names to be used as targets.
return_X_ybool, default=True: If True, returns a tuple (X, y, sample_domain). Otherwise, returns a sklearn.utils.Bunch object.
maskint or float, optional: Value to mask labels at training time. If None, uses -1 for integers and np.nan for floats.

Returns:

datasklearn.utils.Bunch: Dictionary-like object with attributes X, y, sample_domain, domain_names.
(X, y, sample_domain)tuple if return_X_y=True: Tuple of (data, target, sample_domain).

select_domain(sample_domain: ndarray, domains: str | Iterable[str]) → ndarray[source]