skada.datasets.DomainAwareDataset
- class skada.datasets.DomainAwareDataset(domains: List[Tuple[str, ndarray, ndarray] | Tuple[ndarray, ndarray] | Tuple[ndarray]] | Dict[str, Tuple[str, ndarray, ndarray] | Tuple[ndarray, ndarray] | Tuple[ndarray]] | None = None)[source]
Container carrying all dataset domains.
This class allows to store and manipulate datasets from multiple domains, keeping track of the domain information for each sample.
- Parameters:
- domainslist of tuple or dict of tuple or None, optional
List or dictionary of domains to add at initialization. Each domain can be a tuple (X, y) or (X, y, name).
- Attributes:
- domains_list
List of domains added, each as a tuple (X, y) or (X,).
- domain_names_dict
Dictionary mapping each domain name to its internal identifier.
- add_domain(X, y=None, domain_name: str | None = None) DomainAwareDataset [source]
Add a new domain to the dataset.
- Parameters:
- Xnp.ndarray
Feature matrix for the domain.
- ynp.ndarray or None, optional
Labels for the domain. If None, labels are not provided.
- domain_namestr, optional
Name of the domain. If None, a unique name is autogenerated.
- Returns:
- selfDomainAwareDataset
The updated dataset.
- get_domain(domain_name: str) Tuple[ndarray, ndarray | None] [source]
Retrieve the data and labels for a given domain.
- Parameters:
- domain_namestr
Name of the domain to retrieve.
- Returns:
- domaintuple
Tuple containing (X, y) or (X,) for the specified domain.
- merge(dataset: DomainAwareDataset, names_mapping: Mapping | None = None) DomainAwareDataset [source]
Merge another DomainAwareDataset into this one.
- Parameters:
- datasetDomainAwareDataset
The dataset to merge.
- names_mappingmapping, optional
Mapping from old domain names to new domain names.
- Returns:
- selfDomainAwareDataset
The updated dataset.
- pack(as_sources: List[str] | None = None, as_targets: List[str] | None = None, return_X_y: bool = True, train: bool = False, mask: None | int | float = None) Bunch | Tuple[ndarray, ndarray, ndarray] [source]
Aggregates datasets from all domains into a unified domain-aware representation, ensuring compatibility with domain adaptation (DA) estimators.
- Parameters:
- as_sourceslist
List of domain names to be used as sources.
- as_targetslist
List of domain names to be used as targets.
- return_X_ybool, default=True
When set to True, returns a tuple (X, y, sample_domain). Otherwise returns
Bunch
object with the structure described below.- train: bool, default=False
When set to True, masks labels for target domains with -1 (or a mask given), so they are not available at train time.
- mask: int | float (optional), default=None
Value to mask labels at training time.
- Returns:
- data
Bunch
Dictionary-like object, with the following attributes.
- X: ndarray
Samples from all sources and all targets given.
- yndarray
Labels from all sources and all targets.
- sample_domainndarray
The integer label for domain the sample was taken from. By convention, source domains have non-negative labels, and target domain label is always < 0.
- domain_namesdict
The names of domains and associated domain labels.
- (X, y, sample_domain)tuple if return_X_y=True
Tuple of (data, target, sample_domain), see the description above.
- data
- pack_lodo(return_X_y: bool = True) Bunch | Tuple[ndarray, ndarray, ndarray] [source]
Packages all domains in a format compatible with the Leave-One-Domain-Out cross-validator (refer to
LeaveOneDomainOut
for more details). To enable the splitter's dynamic assignment of source and target domains, data from each domain is included in the output twice — once as a source and once as a target.Exercise caution when using this output for purposes other than its intended use, as this could lead to incorrect results and data leakage.
- Parameters:
- return_X_ybool, default=True
When set to True, returns a tuple (X, y, sample_domain). Otherwise returns
Bunch
object with the structure described below.
- Returns:
- data
Bunch
Dictionary-like object, with the following attributes.
- X: ndarray
Samples from all sources and all targets given.
- yndarray
Labels from all sources and all targets.
- sample_domainnp.ndarray
The integer label for domain the sample was taken from. By convention, source domains have non-negative labels, and target domain label is always < 0.
- domain_namesdict
The names of domains and associated domain labels.
- (X, y, sample_domain)tuple if return_X_y=True
Tuple of (data, target, sample_domain), see the description above.
- data
- pack_test(as_targets: List[str], return_X_y: bool = True) Bunch | Tuple[ndarray, ndarray, ndarray] [source]
Aggregate target domains for testing.
This method is equivalent to
pack()
with only target domains andtrain=False
. Labels are not masked.- Parameters:
- as_targetslist of str
List of domain names to be used as targets.
- return_X_ybool, default=True
If True, returns a tuple (X, y, sample_domain). Otherwise, returns a
sklearn.utils.Bunch
object.
- Returns:
- data
sklearn.utils.Bunch
Dictionary-like object with attributes X, y, sample_domain, domain_names.
- (X, y, sample_domain)tuple if return_X_y=True
Tuple of (data, target, sample_domain).
- data
- pack_train(as_sources: List[str], as_targets: List[str], return_X_y: bool = True, mask: None | int | float = None) Bunch | Tuple[ndarray, ndarray, ndarray] [source]
Aggregate source and target domains for training.
This method is equivalent to
pack()
withtrain=True
. It masks the labels for target domains (with -1 or a custom mask value) so that they are not available during training, as required for domain adaptation scenarios.- Parameters:
- as_sourceslist of str
List of domain names to be used as sources.
- as_targetslist of str
List of domain names to be used as targets.
- return_X_ybool, default=True
If True, returns a tuple (X, y, sample_domain). Otherwise, returns a
sklearn.utils.Bunch
object.- maskint or float, optional
Value to mask labels at training time. If None, uses -1 for integers and np.nan for floats.
- Returns:
- data
sklearn.utils.Bunch
Dictionary-like object with attributes X, y, sample_domain, domain_names.
- (X, y, sample_domain)tuple if return_X_y=True
Tuple of (data, target, sample_domain).
- data
- select_domain(sample_domain: ndarray, domains: str | Iterable[str]) ndarray [source]
Select samples belonging to one or more domains.
- Parameters:
- sample_domainnp.ndarray
Array of domain labels for each sample.
- domainsstr or iterable of str
Domain name(s) to select.
- Returns:
- masknp.ndarray
Boolean mask indicating selected samples.