clarena.mtl_datasets

Multi-Task Learning Datasets

This submodule provides the multi-task learning datasets that can be used in CLArena.

Here are the base classes for multi-task learning datasets, which inherit from Lightning LightningDataModule:

MTLDataset: The base class for all multi-task learning datasets.
- MTLCombinedDataset: The base class for combined multi-task learning datasets. A child class of MTLDataset.
- MTLDatasetFromCL: The base class for constructing multi-task learning datasets from continual learning datasets. A child class of MTLDataset.

Please note that this is an API documantation. Please refer to the main documentation pages for more information about how to configure and implement MTL datasets:

View Source

 1r"""
 2
 3# Multi-Task Learning Datasets
 4
 5This submodule provides the **multi-task learning datasets** that can be used in CLArena.
 6
 7Here are the base classes for multi-task learning datasets, which inherit from Lightning `LightningDataModule`:
 8
 9- `MTLDataset`: The base class for all multi-task learning datasets.
10    - `MTLCombinedDataset`: The base class for combined multi-task learning datasets. A child class of `MTLDataset`.
11    - `MTLDatasetFromCL`: The base class for constructing multi-task learning datasets from continual learning datasets. A child class of `MTLDataset`.
12
13Please note that this is an API documantation. Please refer to the main documentation pages for more information about how to configure and implement MTL datasets:
14
15- [**Configure MTL Dataset**](https://pengxiang-wang.com/projects/continual-learning-arena/docs/components/mtl-dataset)
16- [**Implement Custom MTL Dataset**](https://pengxiang-wang.com/projects/continual-learning-arena/docs/custom-implementation/mtl_dataset)
17
18
19
20"""
21
22from .base import MTLDataset, MTLCombinedDataset, MTLDatasetFromCL
23
24from .combined import Combined
25
26
27__all__ = ["MTLDataset", "MTLCombinedDataset", "MTLDatasetFromCL", "combined"]

class MTLCombinedDataset(clarena.mtl_datasets.MTLDataset): View Source

416class MTLCombinedDataset(MTLDataset):
417    r"""The base class of multi-task learning datasets constructed as combinations of several single-task datasets (one dataset per task)."""
418
419    def __init__(
420        self,
421        datasets: dict[int, str],
422        root: str | dict[int, str],
423        sampling_strategy: str = "mixed",
424        batch_size: int = 1,
425        num_workers: int = 0,
426        custom_transforms: (
427            Callable
428            | transforms.Compose
429            | None
430            | dict[int, Callable | transforms.Compose | None]
431        ) = None,
432        repeat_channels: int | None | dict[int, int | None] = None,
433        to_tensor: bool | dict[int, bool] = True,
434        resize: tuple[int, int] | None | dict[int, tuple[int, int] | None] = None,
435    ) -> None:
436        r"""
437        **Args:**
438        - **datasets** (`dict[int, str]`): the dict of dataset class paths for each task. The keys are task IDs and the values are the dataset class paths (as strings) to use for each task.
439        - **root** (`str` | `dict[int, str]`): the root directory where the original data files for constructing the MTL dataset physically live. If `dict[int, str]`, it should be a dict of task IDs and their corresponding root directories.
440        - **sampling_strategy** (`str`): the sampling strategy that construct training batch from each task's dataset; one of:
441            - 'mixed': mixed sampling strategy, which samples from all tasks' datasets.
442        - **batch_size** (`int`): The batch size in train, val, test dataloader.
443        - **num_workers** (`int`): the number of workers for dataloaders.
444        - **custom_transforms** (`transform` or `transforms.Compose` or `None` or dict of them): the custom transforms to apply ONLY to the TRAIN dataset. Can be a single transform, composed transforms, or no transform. `ToTensor()`, normalization, and so on are not included.
445        If it is a dict, the keys are task IDs and the values are the custom transforms for each task. If it is a single transform or composed transforms, it is applied to all tasks. If it is `None`, no custom transforms are applied.
446        - **repeat_channels** (`int` | `None` | dict of them): the number of channels to repeat for each task. Default is `None`, which means no repeat.
447        If it is a dict, the keys are task IDs and the values are the number of channels to repeat for each task. If it is an `int`, it is the same number of channels to repeat for all tasks. If it is `None`, no repeat is applied.
448        - **to_tensor** (`bool` | `dict[int, bool]`): whether to include the `ToTensor()` transform. Default is `True`.
449        If it is a dict, the keys are task IDs and the values are whether to include the `ToTensor()` transform for each task. If it is a single boolean value, it is applied to all tasks.
450        - **resize** (`tuple[int, int]` | `None` or dict of them): the size to resize the images to. Default is `None`, which means no resize. If it is a dict, the keys are task IDs and the values are the sizes to resize for each task. If it is a single tuple of two integers, it is applied to all tasks. If it is `None`, no resize is applied.
451        """
452        super().__init__(
453            root=root,
454            num_tasks=len(
455                datasets
456            ),  # num_tasks is not explicitly provided, but derived from the datasets length
457            sampling_strategy=sampling_strategy,
458            batch_size=batch_size,
459            num_workers=num_workers,
460            custom_transforms=custom_transforms,
461            repeat_channels=repeat_channels,
462            to_tensor=to_tensor,
463            resize=resize,
464        )
465
466        self.original_dataset_python_classes: dict[int, Dataset] = {
467            t: str_to_class(dataset_class_path)
468            for t, dataset_class_path in datasets.items()
469        }
470        r"""The dict of dataset classes for each task."""
471
472    def get_mtl_class_map(self, task_id: int) -> dict[str | int, int]:
473        r"""Get the mapping of classes of task `task_id` to fit multi-task learning.
474
475        **Args:**
476        - **task_id** (`int`): the task ID to query the class map.
477
478        **Returns:**
479        - **class_map** (`dict[str | int, int]`): the class map of the task. Keys are the original class label and values are the integer class labels for multi-task learning. For multi-task learning, the mapped class labels of a task should be continuous integers from 0 to the number of classes.
480        """
481        original_dataset_python_class_t = self.original_dataset_python_classes[task_id]
482        original_dataset_constants_t = DATASET_CONSTANTS_MAPPING[
483            original_dataset_python_class_t
484        ]
485        num_classes_t = original_dataset_constants_t.NUM_CLASSES
486        class_map_t = original_dataset_constants_t.CLASS_MAP
487
488        return {class_map_t[i]: i for i in range(num_classes_t)}
489
490    def setup_tasks_expr(self, train_tasks: list[int], eval_tasks: list[int]) -> None:
491        r"""Set up tasks for the multi-task learning experiment.
492
493        **Args:**
494        - **train_tasks** (`list[int]`): the list of task IDs to be trained. It should be a list of integers, each integer is the task ID. This is used when constructing the dataloader.
495        - **eval_tasks** (`list[int]`): the list of task IDs to be evaluated. It should be a list of integers, each integer is the task ID. This is used when constructing the dataloader.
496        """
497        super().setup_tasks_expr(train_tasks=train_tasks, eval_tasks=eval_tasks)
498
499        for task_id in train_tasks + eval_tasks:
500            original_dataset_python_class_t = self.original_dataset_python_classes[
501                task_id
502            ]
503            original_dataset_constants_t = DATASET_CONSTANTS_MAPPING[
504                original_dataset_python_class_t
505            ]
506            self.mean[task_id] = original_dataset_constants_t.MEAN
507            self.std[task_id] = original_dataset_constants_t.STD
508
509    def setup_tasks_eval(self, eval_tasks: list[int]) -> None:
510        r"""Set up evaluation tasks for the multi-task learning evaluation.
511
512        **Args:**
513        - **eval_tasks** (`list[int]`): the list of task IDs to be evaluated.
514        """
515        super().setup_tasks_eval(eval_tasks=eval_tasks)

The base class of multi-task learning datasets constructed as combinations of several single-task datasets (one dataset per task).

MTLCombinedDataset( datasets: dict[int, str], root: str | dict[int, str], sampling_strategy: str = 'mixed', batch_size: int = 1, num_workers: int = 0, custom_transforms: Union[Callable, torchvision.transforms.transforms.Compose, NoneType, dict[int, Union[Callable, torchvision.transforms.transforms.Compose, NoneType]]] = None, repeat_channels: int | None | dict[int, int | None] = None, to_tensor: bool | dict[int, bool] = True, resize: tuple[int, int] | None | dict[int, tuple[int, int] | None] = None) View Source

419    def __init__(
420        self,
421        datasets: dict[int, str],
422        root: str | dict[int, str],
423        sampling_strategy: str = "mixed",
424        batch_size: int = 1,
425        num_workers: int = 0,
426        custom_transforms: (
427            Callable
428            | transforms.Compose
429            | None
430            | dict[int, Callable | transforms.Compose | None]
431        ) = None,
432        repeat_channels: int | None | dict[int, int | None] = None,
433        to_tensor: bool | dict[int, bool] = True,
434        resize: tuple[int, int] | None | dict[int, tuple[int, int] | None] = None,
435    ) -> None:
436        r"""
437        **Args:**
438        - **datasets** (`dict[int, str]`): the dict of dataset class paths for each task. The keys are task IDs and the values are the dataset class paths (as strings) to use for each task.
439        - **root** (`str` | `dict[int, str]`): the root directory where the original data files for constructing the MTL dataset physically live. If `dict[int, str]`, it should be a dict of task IDs and their corresponding root directories.
440        - **sampling_strategy** (`str`): the sampling strategy that construct training batch from each task's dataset; one of:
441            - 'mixed': mixed sampling strategy, which samples from all tasks' datasets.
442        - **batch_size** (`int`): The batch size in train, val, test dataloader.
443        - **num_workers** (`int`): the number of workers for dataloaders.
444        - **custom_transforms** (`transform` or `transforms.Compose` or `None` or dict of them): the custom transforms to apply ONLY to the TRAIN dataset. Can be a single transform, composed transforms, or no transform. `ToTensor()`, normalization, and so on are not included.
445        If it is a dict, the keys are task IDs and the values are the custom transforms for each task. If it is a single transform or composed transforms, it is applied to all tasks. If it is `None`, no custom transforms are applied.
446        - **repeat_channels** (`int` | `None` | dict of them): the number of channels to repeat for each task. Default is `None`, which means no repeat.
447        If it is a dict, the keys are task IDs and the values are the number of channels to repeat for each task. If it is an `int`, it is the same number of channels to repeat for all tasks. If it is `None`, no repeat is applied.
448        - **to_tensor** (`bool` | `dict[int, bool]`): whether to include the `ToTensor()` transform. Default is `True`.
449        If it is a dict, the keys are task IDs and the values are whether to include the `ToTensor()` transform for each task. If it is a single boolean value, it is applied to all tasks.
450        - **resize** (`tuple[int, int]` | `None` or dict of them): the size to resize the images to. Default is `None`, which means no resize. If it is a dict, the keys are task IDs and the values are the sizes to resize for each task. If it is a single tuple of two integers, it is applied to all tasks. If it is `None`, no resize is applied.
451        """
452        super().__init__(
453            root=root,
454            num_tasks=len(
455                datasets
456            ),  # num_tasks is not explicitly provided, but derived from the datasets length
457            sampling_strategy=sampling_strategy,
458            batch_size=batch_size,
459            num_workers=num_workers,
460            custom_transforms=custom_transforms,
461            repeat_channels=repeat_channels,
462            to_tensor=to_tensor,
463            resize=resize,
464        )
465
466        self.original_dataset_python_classes: dict[int, Dataset] = {
467            t: str_to_class(dataset_class_path)
468            for t, dataset_class_path in datasets.items()
469        }
470        r"""The dict of dataset classes for each task."""

Args:

datasets (dict[int, str]): the dict of dataset class paths for each task. The keys are task IDs and the values are the dataset class paths (as strings) to use for each task.
root (str | dict[int, str]): the root directory where the original data files for constructing the MTL dataset physically live. If dict[int, str], it should be a dict of task IDs and their corresponding root directories.
sampling_strategy (str): the sampling strategy that construct training batch from each task's dataset; one of:
- 'mixed': mixed sampling strategy, which samples from all tasks' datasets.
batch_size (int): The batch size in train, val, test dataloader.
num_workers (int): the number of workers for dataloaders.
custom_transforms (transform or transforms.Compose or None or dict of them): the custom transforms to apply ONLY to the TRAIN dataset. Can be a single transform, composed transforms, or no transform. ToTensor(), normalization, and so on are not included. If it is a dict, the keys are task IDs and the values are the custom transforms for each task. If it is a single transform or composed transforms, it is applied to all tasks. If it is None, no custom transforms are applied.
repeat_channels (int | None | dict of them): the number of channels to repeat for each task. Default is None, which means no repeat. If it is a dict, the keys are task IDs and the values are the number of channels to repeat for each task. If it is an int, it is the same number of channels to repeat for all tasks. If it is None, no repeat is applied.
to_tensor (bool | dict[int, bool]): whether to include the ToTensor() transform. Default is True. If it is a dict, the keys are task IDs and the values are whether to include the ToTensor() transform for each task. If it is a single boolean value, it is applied to all tasks.
resize (tuple[int, int] | None or dict of them): the size to resize the images to. Default is None, which means no resize. If it is a dict, the keys are task IDs and the values are the sizes to resize for each task. If it is a single tuple of two integers, it is applied to all tasks. If it is None, no resize is applied.

original_dataset_python_classes: dict[int, torch.utils.data.dataset.Dataset]

The dict of dataset classes for each task.

def get_mtl_class_map(self, task_id: int) -> dict[str | int, int]: View Source

472    def get_mtl_class_map(self, task_id: int) -> dict[str | int, int]:
473        r"""Get the mapping of classes of task `task_id` to fit multi-task learning.
474
475        **Args:**
476        - **task_id** (`int`): the task ID to query the class map.
477
478        **Returns:**
479        - **class_map** (`dict[str | int, int]`): the class map of the task. Keys are the original class label and values are the integer class labels for multi-task learning. For multi-task learning, the mapped class labels of a task should be continuous integers from 0 to the number of classes.
480        """
481        original_dataset_python_class_t = self.original_dataset_python_classes[task_id]
482        original_dataset_constants_t = DATASET_CONSTANTS_MAPPING[
483            original_dataset_python_class_t
484        ]
485        num_classes_t = original_dataset_constants_t.NUM_CLASSES
486        class_map_t = original_dataset_constants_t.CLASS_MAP
487
488        return {class_map_t[i]: i for i in range(num_classes_t)}

Get the mapping of classes of task task_id to fit multi-task learning.

Args:

task_id (int): the task ID to query the class map.

Returns:

class_map (dict[str | int, int]): the class map of the task. Keys are the original class label and values are the integer class labels for multi-task learning. For multi-task learning, the mapped class labels of a task should be continuous integers from 0 to the number of classes.

def setup_tasks_expr(self, train_tasks: list[int], eval_tasks: list[int]) -> None: View Source

490    def setup_tasks_expr(self, train_tasks: list[int], eval_tasks: list[int]) -> None:
491        r"""Set up tasks for the multi-task learning experiment.
492
493        **Args:**
494        - **train_tasks** (`list[int]`): the list of task IDs to be trained. It should be a list of integers, each integer is the task ID. This is used when constructing the dataloader.
495        - **eval_tasks** (`list[int]`): the list of task IDs to be evaluated. It should be a list of integers, each integer is the task ID. This is used when constructing the dataloader.
496        """
497        super().setup_tasks_expr(train_tasks=train_tasks, eval_tasks=eval_tasks)
498
499        for task_id in train_tasks + eval_tasks:
500            original_dataset_python_class_t = self.original_dataset_python_classes[
501                task_id
502            ]
503            original_dataset_constants_t = DATASET_CONSTANTS_MAPPING[
504                original_dataset_python_class_t
505            ]
506            self.mean[task_id] = original_dataset_constants_t.MEAN
507            self.std[task_id] = original_dataset_constants_t.STD

Set up tasks for the multi-task learning experiment.

Args:

train_tasks (list[int]): the list of task IDs to be trained. It should be a list of integers, each integer is the task ID. This is used when constructing the dataloader.
eval_tasks (list[int]): the list of task IDs to be evaluated. It should be a list of integers, each integer is the task ID. This is used when constructing the dataloader.

def setup_tasks_eval(self, eval_tasks: list[int]) -> None: View Source

509    def setup_tasks_eval(self, eval_tasks: list[int]) -> None:
510        r"""Set up evaluation tasks for the multi-task learning evaluation.
511
512        **Args:**
513        - **eval_tasks** (`list[int]`): the list of task IDs to be evaluated.
514        """
515        super().setup_tasks_eval(eval_tasks=eval_tasks)

Set up evaluation tasks for the multi-task learning evaluation.

Args:

eval_tasks (list[int]): the list of task IDs to be evaluated.

class MTLDatasetFromCL(clarena.mtl_datasets.MTLDataset): View Source

518class MTLDatasetFromCL(MTLDataset):
519    r"""Multi-task learning datasets constructed from the CL datasets.
520
521    This is usually for constructing the reference joint learning experiment for continual learning.
522    """
523
524    def __init__(
525        self,
526        cl_dataset: CLDataset,
527        sampling_strategy: str = "mixed",
528        batch_size: int = 1,
529        num_workers: int = 0,
530    ) -> None:
531        r"""Initialize the `MTLDatasetFromCL` object.
532
533        **Args:**
534        - **cl_dataset** (`CLDataset`): the CL dataset object to be used for constructing the MTL dataset.
535        - **sampling_strategy** (`str`): the sampling strategy that construct training batch from each task's dataset; one of:
536            - 'mixed': mixed sampling strategy, which samples from all tasks' datasets.
537        - **batch_size** (`int`): The batch size in train, val, test dataloader.
538        - **num_workers** (`int`): the number of workers for dataloaders.
539        """
540
541        self.cl_dataset: CLDataset = cl_dataset
542        r"""The CL dataset for constructing the MTL dataset."""
543
544        super().__init__(
545            root=None,
546            num_tasks=cl_dataset.num_tasks,
547            sampling_strategy=sampling_strategy,
548            batch_size=batch_size,
549            num_workers=num_workers,
550            custom_transforms=None,  # already handled in the CL dataset
551            repeat_channels=None,
552            to_tensor=None,
553            resize=None,
554        )
555
556    def prepare_data(self) -> None:
557        r"""Download and prepare data."""
558        self.cl_dataset.prepare_data()  # prepare the CL dataset
559
560    def setup(self, stage: str) -> None:
561        r"""Set up the dataset for different stages.
562
563        **Args:**
564        - **stage** (`str`): the stage of the experiment; one of:
565            - 'fit': training and validation dataset should be assigned to `self.dataset_train` and `self.dataset_val`.
566            - 'test': test dataset should be assigned to `self.dataset_test`.
567        """
568        if stage == "fit":
569            pylogger.debug("Construct train and validation dataset ...")
570
571            # go through each task of continual learning to get the training dataset of each task
572            for task_id in range(1, self.num_tasks + 1):
573                self.cl_dataset.setup_task_id(task_id)
574                self.cl_dataset.setup(stage)
575
576                # label the training dataset with the task ID
577                task_labelled_dataset_train_t = TaskLabelledDataset(
578                    self.cl_dataset.dataset_train_t, task_id
579                )
580                self.dataset_train[task_id] = task_labelled_dataset_train_t
581
582                # label the validation dataset with the task ID
583                task_labelled_dataset_val_t = TaskLabelledDataset(
584                    self.cl_dataset.dataset_val_t, task_id
585                )
586                self.dataset_val[task_id] = task_labelled_dataset_val_t
587
588                pylogger.debug(
589                    "Train and validation dataset for task %d are ready.", task_id
590                )
591                pylogger.info(
592                    "Train dataset for task %d size: %d",
593                    task_id,
594                    len(self.dataset_train[task_id]),
595                )
596                pylogger.info(
597                    "Validation dataset for task %d size: %d",
598                    task_id,
599                    len(self.dataset_val[task_id]),
600                )
601
602        elif stage == "test":
603
604            pylogger.debug("Construct test dataset ...")
605
606            for task_id in self.eval_tasks:
607
608                self.cl_dataset.setup_task_id(task_id)
609                self.cl_dataset.setup(stage)
610
611                task_labelled_dataset_test_t = TaskLabelledDataset(
612                    self.cl_dataset.dataset_test[task_id], task_id
613                )
614
615                self.dataset_test[task_id] = task_labelled_dataset_test_t
616
617                pylogger.debug("Test dataset for task %d are ready.", task_id)
618                pylogger.info(
619                    "Test dataset for task %d size: %d",
620                    task_id,
621                    len(self.dataset_test[task_id]),
622                )
623
624    def get_mtl_class_map(self, task_id: int) -> dict[str | int, int]:
625        r"""Get the mapping of classes of task `task_id` to fit multi-task learning.
626
627        **Args:**
628        - **task_id** (`int`): The task ID to query class map.
629
630        **Returns:**
631        - **class_map**(`dict[str | int, int]`): the class map of the task. Keys are original class labels and values are integer class labels for multi-task learning. The mapped class labels of each task should be continuous integers from 0 to the number of classes.
632        """
633        return self.cl_dataset.get_cl_class_map(
634            task_id
635        )  # directly use the CL dataset's class map (from TIL setting)
636
637    def setup_tasks_expr(self, train_tasks: list[int], eval_tasks: list[int]) -> None:
638        r"""Set up tasks for the multi-task learning experiment.
639
640        **Args:**
641        - **train_tasks** (`list[int]`): the list of task IDs to be trained. It should be a list of integers, each integer is the task ID. This is used when constructing the dataloader.
642        - **eval_tasks** (`list[int]`): the list of task IDs to be evaluated. It should be a list of integers, each integer is the task ID. This is used when constructing the dataloader.
643        """
644        super().setup_tasks_expr(train_tasks=train_tasks, eval_tasks=eval_tasks)
645
646        # MTL requires independent heads
647        self.cl_dataset.set_cl_paradigm(cl_paradigm="TIL")
648        for task_id in train_tasks + eval_tasks:
649            self.cl_dataset.setup_task_id(task_id)
650
651    def setup_tasks_eval(self, eval_tasks: list[int]) -> None:
652        r"""Set up evaluation tasks for the multi-task learning evaluation.
653
654        **Args:**
655        - **eval_tasks** (`list[int]`): the list of task IDs to be evaluated."""
656        super().setup_tasks_eval(eval_tasks=eval_tasks)
657
658        # MTL requires independent heads
659        self.cl_dataset.set_cl_paradigm(cl_paradigm="TIL")
660        for task_id in eval_tasks:
661            self.cl_dataset.setup_task_id(task_id)

Multi-task learning datasets constructed from the CL datasets.

This is usually for constructing the reference joint learning experiment for continual learning.

MTLDatasetFromCL( cl_dataset: clarena.cl_datasets.CLDataset, sampling_strategy: str = 'mixed', batch_size: int = 1, num_workers: int = 0) View Source

524    def __init__(
525        self,
526        cl_dataset: CLDataset,
527        sampling_strategy: str = "mixed",
528        batch_size: int = 1,
529        num_workers: int = 0,
530    ) -> None:
531        r"""Initialize the `MTLDatasetFromCL` object.
532
533        **Args:**
534        - **cl_dataset** (`CLDataset`): the CL dataset object to be used for constructing the MTL dataset.
535        - **sampling_strategy** (`str`): the sampling strategy that construct training batch from each task's dataset; one of:
536            - 'mixed': mixed sampling strategy, which samples from all tasks' datasets.
537        - **batch_size** (`int`): The batch size in train, val, test dataloader.
538        - **num_workers** (`int`): the number of workers for dataloaders.
539        """
540
541        self.cl_dataset: CLDataset = cl_dataset
542        r"""The CL dataset for constructing the MTL dataset."""
543
544        super().__init__(
545            root=None,
546            num_tasks=cl_dataset.num_tasks,
547            sampling_strategy=sampling_strategy,
548            batch_size=batch_size,
549            num_workers=num_workers,
550            custom_transforms=None,  # already handled in the CL dataset
551            repeat_channels=None,
552            to_tensor=None,
553            resize=None,
554        )

Initialize the MTLDatasetFromCL object.

Args:

cl_dataset (CLDataset): the CL dataset object to be used for constructing the MTL dataset.
sampling_strategy (str): the sampling strategy that construct training batch from each task's dataset; one of:
- 'mixed': mixed sampling strategy, which samples from all tasks' datasets.
batch_size (int): The batch size in train, val, test dataloader.
num_workers (int): the number of workers for dataloaders.

cl_dataset: clarena.cl_datasets.CLDataset

The CL dataset for constructing the MTL dataset.

def prepare_data(self) -> None: View Source

556    def prepare_data(self) -> None:
557        r"""Download and prepare data."""
558        self.cl_dataset.prepare_data()  # prepare the CL dataset

Download and prepare data.

def setup(self, stage: str) -> None: View Source

560    def setup(self, stage: str) -> None:
561        r"""Set up the dataset for different stages.
562
563        **Args:**
564        - **stage** (`str`): the stage of the experiment; one of:
565            - 'fit': training and validation dataset should be assigned to `self.dataset_train` and `self.dataset_val`.
566            - 'test': test dataset should be assigned to `self.dataset_test`.
567        """
568        if stage == "fit":
569            pylogger.debug("Construct train and validation dataset ...")
570
571            # go through each task of continual learning to get the training dataset of each task
572            for task_id in range(1, self.num_tasks + 1):
573                self.cl_dataset.setup_task_id(task_id)
574                self.cl_dataset.setup(stage)
575
576                # label the training dataset with the task ID
577                task_labelled_dataset_train_t = TaskLabelledDataset(
578                    self.cl_dataset.dataset_train_t, task_id
579                )
580                self.dataset_train[task_id] = task_labelled_dataset_train_t
581
582                # label the validation dataset with the task ID
583                task_labelled_dataset_val_t = TaskLabelledDataset(
584                    self.cl_dataset.dataset_val_t, task_id
585                )
586                self.dataset_val[task_id] = task_labelled_dataset_val_t
587
588                pylogger.debug(
589                    "Train and validation dataset for task %d are ready.", task_id
590                )
591                pylogger.info(
592                    "Train dataset for task %d size: %d",
593                    task_id,
594                    len(self.dataset_train[task_id]),
595                )
596                pylogger.info(
597                    "Validation dataset for task %d size: %d",
598                    task_id,
599                    len(self.dataset_val[task_id]),
600                )
601
602        elif stage == "test":
603
604            pylogger.debug("Construct test dataset ...")
605
606            for task_id in self.eval_tasks:
607
608                self.cl_dataset.setup_task_id(task_id)
609                self.cl_dataset.setup(stage)
610
611                task_labelled_dataset_test_t = TaskLabelledDataset(
612                    self.cl_dataset.dataset_test[task_id], task_id
613                )
614
615                self.dataset_test[task_id] = task_labelled_dataset_test_t
616
617                pylogger.debug("Test dataset for task %d are ready.", task_id)
618                pylogger.info(
619                    "Test dataset for task %d size: %d",
620                    task_id,
621                    len(self.dataset_test[task_id]),
622                )

Set up the dataset for different stages.

Args:

stage (str): the stage of the experiment; one of:
- 'fit': training and validation dataset should be assigned to self.dataset_train and self.dataset_val.
- 'test': test dataset should be assigned to self.dataset_test.

def get_mtl_class_map(self, task_id: int) -> dict[str | int, int]: View Source

624    def get_mtl_class_map(self, task_id: int) -> dict[str | int, int]:
625        r"""Get the mapping of classes of task `task_id` to fit multi-task learning.
626
627        **Args:**
628        - **task_id** (`int`): The task ID to query class map.
629
630        **Returns:**
631        - **class_map**(`dict[str | int, int]`): the class map of the task. Keys are original class labels and values are integer class labels for multi-task learning. The mapped class labels of each task should be continuous integers from 0 to the number of classes.
632        """
633        return self.cl_dataset.get_cl_class_map(
634            task_id
635        )  # directly use the CL dataset's class map (from TIL setting)

Get the mapping of classes of task task_id to fit multi-task learning.

Args:

task_id (int): The task ID to query class map.

Returns:

class_map(dict[str | int, int]): the class map of the task. Keys are original class labels and values are integer class labels for multi-task learning. The mapped class labels of each task should be continuous integers from 0 to the number of classes.

def setup_tasks_expr(self, train_tasks: list[int], eval_tasks: list[int]) -> None: View Source

637    def setup_tasks_expr(self, train_tasks: list[int], eval_tasks: list[int]) -> None:
638        r"""Set up tasks for the multi-task learning experiment.
639
640        **Args:**
641        - **train_tasks** (`list[int]`): the list of task IDs to be trained. It should be a list of integers, each integer is the task ID. This is used when constructing the dataloader.
642        - **eval_tasks** (`list[int]`): the list of task IDs to be evaluated. It should be a list of integers, each integer is the task ID. This is used when constructing the dataloader.
643        """
644        super().setup_tasks_expr(train_tasks=train_tasks, eval_tasks=eval_tasks)
645
646        # MTL requires independent heads
647        self.cl_dataset.set_cl_paradigm(cl_paradigm="TIL")
648        for task_id in train_tasks + eval_tasks:
649            self.cl_dataset.setup_task_id(task_id)

Set up tasks for the multi-task learning experiment.

Args:

train_tasks (list[int]): the list of task IDs to be trained. It should be a list of integers, each integer is the task ID. This is used when constructing the dataloader.
eval_tasks (list[int]): the list of task IDs to be evaluated. It should be a list of integers, each integer is the task ID. This is used when constructing the dataloader.

def setup_tasks_eval(self, eval_tasks: list[int]) -> None: View Source

651    def setup_tasks_eval(self, eval_tasks: list[int]) -> None:
652        r"""Set up evaluation tasks for the multi-task learning evaluation.
653
654        **Args:**
655        - **eval_tasks** (`list[int]`): the list of task IDs to be evaluated."""
656        super().setup_tasks_eval(eval_tasks=eval_tasks)
657
658        # MTL requires independent heads
659        self.cl_dataset.set_cl_paradigm(cl_paradigm="TIL")
660        for task_id in eval_tasks:
661            self.cl_dataset.setup_task_id(task_id)

Set up evaluation tasks for the multi-task learning evaluation.

Args:

eval_tasks (list[int]): the list of task IDs to be evaluated.