Implement Your CL Dataset Class

Continual learning datasets are implemented as classes based on the base classes provided by this package in clarena/cl_datasets/base.py. These base classes are implemented based on Lightning data module with added features for continual learning. Here are the base classes:

CLDataset: the general base class for all kinds of CL datasets, incorporating mechanisms for managing sequential tasks.
CLPermutedDataset: the base class for permuted CL datasets. It is based on CLDataset, incorporating the permutation operation. See my CL beginners’ guide to know more about permuted CL datasets.
CLSplitDataset: the base class for split CL datasets. It is based on CLDataset, incorporating mechanism getting subset of certain classes of the original dataset. See my CL beginners’ guide to know more about split CL datasets.

To implement your CL dataset, create a class inheriting from one of the base classes, include required arguments (shown in the base classes) in the __init__() method and implement the required and your own methods.

Required Arguments

Below is the list of required arguments for your CL dataset class with meaning explained. It is outlined in the __iniit__() method of the base classes, please refer to the API reference for detailed information.

API Reference (CL Datasets)

root: the root directory where the original data files for constructing the CL dataset physically live.
num_tasks: the maximum number of tasks supported by the CL dataset.
validation_percentage: the percentage to randomly split some of the training data into validation data. We require that the validation dataset is divided from training dataset.
batch_size: the batch size to construct train, val and test dataloaders.
num_workers: the number of workers for dataloaders.
custom_transforms: the custom transforms to apply to ONLY TRAIN dataset. It excludes the functional transforms like ToTensor(), normalise, permute and so on as they are already included in the system.
custom_target_transforms: the custom target transforms to apply to dataset labels. It excludes the CL class mapping transform as it is already included in the system..

Warning

The label in your dataset are expected to be integers starting from 0. If not, you need to implement a target transform to map your original class labels to integers.

Exclusive arguments for CLPermutedDataset:

permutation_mode: the mode of permutation, which includes:
1. ‘all’: permute all pixels.
2. ‘by_channel’: permute channel by channel separately. All channels are applied the same permutation order.
3. ‘first_channel_only’: permute only the first channel.
permutation_seeds: the seeds for permutation operations used to construct tasks.

Implementing Instructions

These methods must be implemented:

cl_class_map(): in this package, the class labels are uniformly transformed to integers starting from 0, for the convenience to apply cross-entropy loss. You need to tell the module explicitly the map.
mean(), std(): get the mean and standard deviation values to construct normalisation transform.
perpare_data(): get your original dataset whatever source is from, like download. This is required by the parent Lightning datamodule.
train_and_val_dataset(), test_dataset(): construct the training, validation, test dataset for current task indicated by self-updated self.task_id.

For permuted CL datasets:

Specify class property img_size to construct permutation operation.
Specify class property mean_original, std_original instead of implementing mean(), std().

Please find the API reference for detailed information about implementation, and remember you can always take implemented CL datasets in package source codes clarena/cl_datasets/ as examples, like PermutedMNIST!