skada.datasets.DomainAwareDataset

class skada.datasets.DomainAwareDataset(domains: List[Tuple[str, ndarray, ndarray] | Tuple[ndarray, ndarray] | Tuple[ndarray]] | Dict[str, Tuple[str, ndarray, ndarray] | Tuple[ndarray, ndarray] | Tuple[ndarray]] | None = None)[source]

Container carrying all dataset domains.

This class allows to store and manipulate datasets from multiple domains, keeping track of the domain information for each sample.

Parameters:
domainslist of tuple or dict of tuple or None, optional

List or dictionary of domains to add at initialization. Each domain can be a tuple (X, y) or (X, y, name).

Attributes:
domains_list

List of domains added, each as a tuple (X, y) or (X,).

domain_names_dict

Dictionary mapping each domain name to its internal identifier.

add_domain(X, y=None, domain_name: str | None = None) DomainAwareDataset[source]

Add a new domain to the dataset.

Parameters:
Xnp.ndarray

Feature matrix for the domain.

ynp.ndarray or None, optional

Labels for the domain. If None, labels are not provided.

domain_namestr, optional

Name of the domain. If None, a unique name is autogenerated.

Returns:
selfDomainAwareDataset

The updated dataset.

get_domain(domain_name: str) Tuple[ndarray, ndarray | None][source]

Retrieve the data and labels for a given domain.

Parameters:
domain_namestr

Name of the domain to retrieve.

Returns:
domaintuple

Tuple containing (X, y) or (X,) for the specified domain.

merge(dataset: DomainAwareDataset, names_mapping: Mapping | None = None) DomainAwareDataset[source]

Merge another DomainAwareDataset into this one.

Parameters:
datasetDomainAwareDataset

The dataset to merge.

names_mappingmapping, optional

Mapping from old domain names to new domain names.

Returns:
selfDomainAwareDataset

The updated dataset.

pack(as_sources: List[str] | None = None, as_targets: List[str] | None = None, return_X_y: bool = True, train: bool = False, mask: None | int | float = None) Bunch | Tuple[ndarray, ndarray, ndarray][source]

Aggregates datasets from all domains into a unified domain-aware representation, ensuring compatibility with domain adaptation (DA) estimators.

Parameters:
as_sourceslist

List of domain names to be used as sources.

as_targetslist

List of domain names to be used as targets.

return_X_ybool, default=True

When set to True, returns a tuple (X, y, sample_domain). Otherwise returns Bunch object with the structure described below.

train: bool, default=False

When set to True, masks labels for target domains with -1 (or a mask given), so they are not available at train time.

mask: int | float (optional), default=None

Value to mask labels at training time.

Returns:
dataBunch

Dictionary-like object, with the following attributes.

X: ndarray

Samples from all sources and all targets given.

yndarray

Labels from all sources and all targets.

sample_domainndarray

The integer label for domain the sample was taken from. By convention, source domains have non-negative labels, and target domain label is always < 0.

domain_namesdict

The names of domains and associated domain labels.

(X, y, sample_domain)tuple if return_X_y=True

Tuple of (data, target, sample_domain), see the description above.

pack_lodo(return_X_y: bool = True) Bunch | Tuple[ndarray, ndarray, ndarray][source]

Packages all domains in a format compatible with the Leave-One-Domain-Out cross-validator (refer to LeaveOneDomainOut for more details). To enable the splitter's dynamic assignment of source and target domains, data from each domain is included in the output twice — once as a source and once as a target.

Exercise caution when using this output for purposes other than its intended use, as this could lead to incorrect results and data leakage.

Parameters:
return_X_ybool, default=True

When set to True, returns a tuple (X, y, sample_domain). Otherwise returns Bunch object with the structure described below.

Returns:
dataBunch

Dictionary-like object, with the following attributes.

X: ndarray

Samples from all sources and all targets given.

yndarray

Labels from all sources and all targets.

sample_domainnp.ndarray

The integer label for domain the sample was taken from. By convention, source domains have non-negative labels, and target domain label is always < 0.

domain_namesdict

The names of domains and associated domain labels.

(X, y, sample_domain)tuple if return_X_y=True

Tuple of (data, target, sample_domain), see the description above.

pack_test(as_targets: List[str], return_X_y: bool = True) Bunch | Tuple[ndarray, ndarray, ndarray][source]

Aggregate target domains for testing.

This method is equivalent to pack() with only target domains and train=False. Labels are not masked.

Parameters:
as_targetslist of str

List of domain names to be used as targets.

return_X_ybool, default=True

If True, returns a tuple (X, y, sample_domain). Otherwise, returns a sklearn.utils.Bunch object.

Returns:
datasklearn.utils.Bunch

Dictionary-like object with attributes X, y, sample_domain, domain_names.

(X, y, sample_domain)tuple if return_X_y=True

Tuple of (data, target, sample_domain).

pack_train(as_sources: List[str], as_targets: List[str], return_X_y: bool = True, mask: None | int | float = None) Bunch | Tuple[ndarray, ndarray, ndarray][source]

Aggregate source and target domains for training.

This method is equivalent to pack() with train=True. It masks the labels for target domains (with -1 or a custom mask value) so that they are not available during training, as required for domain adaptation scenarios.

Parameters:
as_sourceslist of str

List of domain names to be used as sources.

as_targetslist of str

List of domain names to be used as targets.

return_X_ybool, default=True

If True, returns a tuple (X, y, sample_domain). Otherwise, returns a sklearn.utils.Bunch object.

maskint or float, optional

Value to mask labels at training time. If None, uses -1 for integers and np.nan for floats.

Returns:
datasklearn.utils.Bunch

Dictionary-like object with attributes X, y, sample_domain, domain_names.

(X, y, sample_domain)tuple if return_X_y=True

Tuple of (data, target, sample_domain).

select_domain(sample_domain: ndarray, domains: str | Iterable[str]) ndarray[source]

Select samples belonging to one or more domains.

Parameters:
sample_domainnp.ndarray

Array of domain labels for each sample.

domainsstr or iterable of str

Domain name(s) to select.

Returns:
masknp.ndarray

Boolean mask indicating selected samples.

Examples using skada.datasets.DomainAwareDataset

Adversarial domain adaptation methods.

Adversarial domain adaptation methods.

Divergence domain adaptation methods.

Divergence domain adaptation methods.

Optimal transport domain adaptation methods.

Optimal transport domain adaptation methods.

Subspace method example on subspace shift dataset

Subspace method example on subspace shift dataset

Comparison of DA classification methods

Comparison of DA classification methods

Using cross_val_score with skada

Using cross_val_score with skada

Visualizing cross-validation behavior in skada

Visualizing cross-validation behavior in skada

Using GridSearchCV with skada

Using GridSearchCV with skada