skada.datasets.DomainAwareDataset

class skada.datasets.DomainAwareDataset(domains: List[Tuple[str, ndarray, ndarray] | Tuple[ndarray, ndarray] | Tuple[ndarray]] | Dict[str, Tuple[str, ndarray, ndarray] | Tuple[ndarray, ndarray] | Tuple[ndarray]] | None = None)[source]

Container carrying all dataset domains.

This class allows to store and manipulate datasets from multiple domains, keeping track of the domain information for each sample.

Parameters:
domainslist of tuple or dict of tuple or None, optional

List or dictionary of domains to add at initialization. Each domain can be a tuple (X, y) or (X, y, name).

Attributes:
domains_list

List of domains added, each as a tuple (X, y) or (X,).

domain_names_dict

Dictionary mapping each domain name to its internal identifier.

add_domain(X, y=None, domain_name: str | None = None) DomainAwareDataset[source]

Add a new domain to the dataset.

Parameters:
Xnp.ndarray

Feature matrix for the domain.

ynp.ndarray or None, optional

Labels for the domain. If None, labels are not provided.

domain_namestr, optional

Name of the domain. If None, a unique name is autogenerated.

Returns:
selfDomainAwareDataset

The updated dataset.

get_domain(domain_name: str) Tuple[ndarray, ndarray | None][source]

Retrieve the data and labels for a given domain.

Parameters:
domain_namestr

Name of the domain to retrieve.

Returns:
domaintuple

Tuple containing (X, y) or (X,) for the specified domain.

merge(dataset: DomainAwareDataset, names_mapping: Mapping | None = None) DomainAwareDataset[source]

Merge another DomainAwareDataset into this one.

Parameters:
datasetDomainAwareDataset

The dataset to merge.

names_mappingmapping, optional

Mapping from old domain names to new domain names.

Returns:
selfDomainAwareDataset

The updated dataset.

pack(as_sources: List[str], as_targets: List[str], mask_target_labels: bool, return_X_y: bool = True, train: bool | None = None, mask: None | int | float = None) Bunch | Tuple[ndarray, ndarray, ndarray][source]

Aggregates datasets from all domains into a unified domain-aware representation, ensuring compatibility with domain adaptation (DA) estimators.

Parameters:
as_sourceslist

List of domain names to be used as sources. An empty list indicates that no source domains are used.

as_targetslist

List of domain names to be used as targets. An empty list indicates that no target domains are used.

mask_target_labelsbool

This parameter should be set to True for training and False for testing. When set to True, masks labels for target domains with -1 for classification tasks of nan for regression tasks, so they are not available at train time.

return_X_ybool, default=True

When set to True, returns a tuple (X, y, sample_domain). Otherwise returns Bunch object with the structure described below.

train: Optional[bool], default=None

[DEPRECATED] Use `mask_target_labels`instead.

mask: int | float (optional), default=None

Value to mask labels at training time.

Returns:
dataBunch

Dictionary-like object, with the following attributes.

X: ndarray

Samples from all sources and all targets given.

yndarray

Labels from all sources and all targets.

sample_domainndarray

The integer label for domain the sample was taken from. By convention, source domains have non-negative labels, and target domain label is always < 0.

domain_namesdict

The names of domains and associated domain labels.

(X, y, sample_domain)tuple if return_X_y=True

Tuple of (data, target, sample_domain), see the description above.

pack_lodo(return_X_y: bool = True) Bunch | Tuple[ndarray, ndarray, ndarray][source]

Packages all domains in a format compatible with the Leave-One-Domain-Out cross-validator (refer to LeaveOneDomainOut for more details). To enable the splitter's dynamic assignment of source and target domains, data from each domain is included in the output twice — once as a source and once as a target.

Exercise caution when using this output for purposes other than its intended use, as this could lead to incorrect results and data leakage.

Parameters:
return_X_ybool, default=True

When set to True, returns a tuple (X, y, sample_domain). Otherwise returns Bunch object with the structure described below.

Returns:
dataBunch

Dictionary-like object, with the following attributes.

X: ndarray

Samples from all sources and all targets given.

yndarray

Labels from all sources and all targets.

sample_domainnp.ndarray

The integer label for domain the sample was taken from. By convention, source domains have non-negative labels, and target domain label is always < 0.

domain_namesdict

The names of domains and associated domain labels.

(X, y, sample_domain)tuple if return_X_y=True

Tuple of (data, target, sample_domain), see the description above.

pack_test(as_targets: List[str], return_X_y: bool = True) Bunch | Tuple[ndarray, ndarray, ndarray][source]

Aggregate target domains for testing.

Warning

This method is deprecated and will be removed in future versions. Use pack() with mask_target_labels=False instead.

This method is equivalent to pack() with only target domains and train=False. Labels are not masked.

Parameters:
as_targetslist of str

List of domain names to be used as targets.

return_X_ybool, default=True

If True, returns a tuple (X, y, sample_domain). Otherwise, returns a sklearn.utils.Bunch object.

Returns:
datasklearn.utils.Bunch

Dictionary-like object with attributes X, y, sample_domain, domain_names.

(X, y, sample_domain)tuple if return_X_y=True

Tuple of (data, target, sample_domain).

pack_train(as_sources: List[str], as_targets: List[str], return_X_y: bool = True, mask: None | int | float = None) Bunch | Tuple[ndarray, ndarray, ndarray][source]

Aggregate source and target domains for training.

Warning

This method is deprecated and will be removed in future versions. Use pack() with mask_target_labels=True instead.

This method is equivalent to pack() with train=True. It masks the labels for target domains (with -1 or a custom mask value) so that they are not available during training, as required for domain adaptation scenarios.

Parameters:
as_sourceslist of str

List of domain names to be used as sources.

as_targetslist of str

List of domain names to be used as targets.

return_X_ybool, default=True

If True, returns a tuple (X, y, sample_domain). Otherwise, returns a sklearn.utils.Bunch object.

maskint or float, optional

Value to mask labels at training time. If None, uses -1 for integers and np.nan for floats.

Returns:
datasklearn.utils.Bunch

Dictionary-like object with attributes X, y, sample_domain, domain_names.

(X, y, sample_domain)tuple if return_X_y=True

Tuple of (data, target, sample_domain).

select_domain(sample_domain: ndarray, domains: str | Iterable[str]) ndarray[source]

Select samples belonging to one or more domains.

Parameters:
sample_domainnp.ndarray

Array of domain labels for each sample.

domainsstr or iterable of str

Domain name(s) to select.

Returns:
masknp.ndarray

Boolean mask indicating selected samples.

Examples using skada.datasets.DomainAwareDataset

Adversarial domain adaptation methods.

Adversarial domain adaptation methods.

Divergence domain adaptation methods.

Divergence domain adaptation methods.

Optimal transport domain adaptation methods.

Optimal transport domain adaptation methods.

Subspace method example on subspace shift dataset

Subspace method example on subspace shift dataset

Comparison of DA classification methods

Comparison of DA classification methods

Using cross_val_score with skada

Using cross_val_score with skada

Visualizing cross-validation behavior in skada

Visualizing cross-validation behavior in skada

Using GridSearchCV with skada

Using GridSearchCV with skada