skada.datasets.DomainAwareDataset
- class skada.datasets.DomainAwareDataset(domains: List[Tuple[str, ndarray | Tensor, ndarray | Tensor] | Tuple[ndarray | Tensor, ndarray | Tensor] | Tuple[ndarray | Tensor]] | Dict[str, Tuple[str, ndarray | Tensor, ndarray | Tensor] | Tuple[ndarray | Tensor, ndarray | Tensor] | Tuple[ndarray | Tensor]] | None = None)[source]
Container carrying all dataset domains.
This class allows to store and manipulate datasets from multiple domains, keeping track of the domain information for each sample.
- Parameters:
- domainslist of tuple or dict of tuple or None, optional
List or dictionary of domains to add at initialization. Each domain can be a tuple (X, y) or (X, y, name).
- Attributes:
- domains_list
List of domains added, each as a tuple (X, y) or (X,).
- domain_names_dict
Dictionary mapping each domain name to its internal identifier.
- add_domain(X, y=None, domain_name: str | None = None) DomainAwareDataset [source]
Add a new domain to the dataset.
- Parameters:
- XArrayLike
Feature matrix for the domain.
- yArrayLike or None, optional
Labels for the domain. If None, labels are not provided.
- domain_namestr, optional
Name of the domain. If None, a unique name is autogenerated.
- Returns:
- selfDomainAwareDataset
The updated dataset.
- get_domain(domain_name: str) Tuple[ndarray | Tensor, ndarray | Tensor | None] [source]
Retrieve the data and labels for a given domain.
- Parameters:
- domain_namestr
Name of the domain to retrieve.
- Returns:
- domaintuple
Tuple containing (X, y) or (X,) for the specified domain.
- merge(dataset: DomainAwareDataset, names_mapping: Mapping | None = None) DomainAwareDataset [source]
Merge another DomainAwareDataset into this one.
- Parameters:
- datasetDomainAwareDataset
The dataset to merge.
- names_mappingmapping, optional
Mapping from old domain names to new domain names.
- Returns:
- selfDomainAwareDataset
The updated dataset.
- pack(as_sources: List[str], as_targets: List[str], mask_target_labels: bool, return_X_y: bool | None = None, return_type: Literal['auto', 'array', 'tensor', 'DeepDADataset', 'Bunch'] = 'auto', train: bool | None = None, mask: None | int | float = None) Bunch | Tuple[ndarray | Tensor, ndarray | Tensor, ndarray | Tensor] | DeepDADataset [source]
Aggregates datasets from all domains into a unified domain-aware representation, ensuring compatibility with domain adaptation (DA) estimators.
- Parameters:
- as_sourceslist
List of domain names to be used as sources. An empty list indicates that no source domains are used.
- as_targetslist
List of domain names to be used as targets. An empty list indicates that no target domains are used.
- mask_target_labelsbool
This parameter should be set to True for training and False for testing. When set to True, masks labels for target domains with -1 for classification tasks of nan for regression tasks, so they are not available at train time.
- return_X_ybool, default=True
[DEPRECATED] When set to True, returns a tuple (X, y, sample_domain). Otherwise returns
Bunch
object with the structure described below.- return_typeLiteral["auto", "array", "tensor", "DeepDADataset", "Bunch"]
The type of the returned data. If "auto", it will return tensors if the data is in tensor format, otherwise it will return numpy arrays. If "array", returns numpy arrays. If "tensor", returns torch tensors. If "DeepDADataset", returns a
DeepDADataset
If "Bunch", returns aBunch
object- train: Optional[bool], default=None
[DEPRECATED] Use `mask_target_labels`instead.
- mask: int | float (optional), default=None
Value to mask labels at training time.
- Returns:
- data
Bunch
Dictionary-like object, with the following attributes.
- X: ndarray
Samples from all sources and all targets given.
- yndarray
Labels from all sources and all targets.
- sample_domainndarray
The integer label for domain the sample was taken from. By convention, source domains have non-negative labels, and target domain label is always < 0.
- domain_namesdict
The names of domains and associated domain labels.
- (X, y, sample_domain)tuple of Arraylike if return_type="array" or "tensor"
Tuple of (data, target, sample_domain), see the description above.
- deep_da_datasetDeepDADataset
compatible with torch : torch.Dataset extended with the sample_domain
- data
- pack_lodo(return_X_y: bool = True, return_type: Literal['auto', 'array', 'tensor', 'DeepDADataset', 'Bunch'] = 'auto') Bunch | Tuple[ndarray | Tensor, ndarray | Tensor, ndarray | Tensor] | DeepDADataset [source]
Packages all domains in a format compatible with the Leave-One-Domain-Out cross-validator (refer to
LeaveOneDomainOut
for more details). To enable the splitter's dynamic assignment of source and target domains, data from each domain is included in the output twice — once as a source and once as a target.Exercise caution when using this output for purposes other than its intended use, as this could lead to incorrect results and data leakage.
- Parameters:
- return_X_ybool, default=True
[DEPRECATED] When set to True, returns a tuple (X, y, sample_domain). Otherwise returns
Bunch
object with the structure described below.- return_typeLiteral["auto", "array", "tensor", "DeepDADataset", "Bunch"]
The type of the returned data. If "auto", it will return tensors if the data is in tensor format, otherwise it will return numpy arrays. If "array", returns numpy arrays. If "tensor", returns torch tensors. If "DeepDADataset", returns a
DeepDADataset
If "Bunch", returns aBunch
object
- Returns:
- data
Bunch
Dictionary-like object, with the following attributes.
- X: ArrayLike
Samples from all sources and all targets given.
- yArrayLike
Labels from all sources and all targets.
- sample_domainArrayLike
The integer label for domain the sample was taken from. By convention, source domains have non-negative labels, and target domain label is always < 0.
- domain_namesdict
The names of domains and associated domain labels.
- (X, y, sample_domain)tuple if return_X_y=True
Tuple of (data, target, sample_domain), see the description above.
- data
- pack_test(as_targets: List[str], return_X_y: bool = True) Bunch | Tuple[ndarray | Tensor, ndarray | Tensor, ndarray | Tensor] | DeepDADataset [source]
Aggregate target domains for testing.
Warning
This method is deprecated and will be removed in future versions. Use
pack()
withmask_target_labels=False
instead.This method is equivalent to
pack()
with only target domains andtrain=False
. Labels are not masked.- Parameters:
- as_targetslist of str
List of domain names to be used as targets.
- return_X_ybool, default=True
If True, returns a tuple (X, y, sample_domain). Otherwise, returns a
sklearn.utils.Bunch
object.
- Returns:
- data
sklearn.utils.Bunch
Dictionary-like object with attributes X, y, sample_domain, domain_names.
- (X, y, sample_domain)tuple if return_X_y=True
Tuple of (data, target, sample_domain).
- data
- pack_train(as_sources: List[str], as_targets: List[str], return_X_y: bool = True, mask: None | int | float = None) Bunch | Tuple[ndarray | Tensor, ndarray | Tensor, ndarray | Tensor] | DeepDADataset [source]
Aggregate source and target domains for training.
Warning
This method is deprecated and will be removed in future versions. Use
pack()
withmask_target_labels=True
instead.This method is equivalent to
pack()
withtrain=True
. It masks the labels for target domains (with -1 or a custom mask value) so that they are not available during training, as required for domain adaptation scenarios.- Parameters:
- as_sourceslist of str
List of domain names to be used as sources.
- as_targetslist of str
List of domain names to be used as targets.
- return_X_ybool, default=True
If True, returns a tuple (X, y, sample_domain). Otherwise, returns a
sklearn.utils.Bunch
object.- maskint or float, optional
Value to mask labels at training time. If None, uses -1 for integers and np.nan for floats.
- Returns:
- data
sklearn.utils.Bunch
Dictionary-like object with attributes X, y, sample_domain, domain_names.
- (X, y, sample_domain)tuple if return_X_y=True
Tuple of (data, target, sample_domain).
- data
- select_domain(sample_domain: ndarray | Tensor, domains: str | Iterable[str]) ndarray | Tensor [source]
Select samples belonging to one or more domains.
- Parameters:
- sample_domainArrayLike
Array of domain labels for each sample.
- domainsstr or iterable of str
Domain name(s) to select.
- Returns:
- maskArrayLike
Boolean mask indicating selected samples.