Shawn’s Blog
  • 🗂️ Collections
    • 🖥️ Slides Gallery
    • 🧑‍🍳️ Cooking Ideas
    • 🍱 Cookbook
    • 💬 Language Learning
    • 🎼 Songbook
  • ⚙️ Projects
    • ⚛ Continual Learning Arena
  • 📄 Papers
    • AdaHAT
    • FG-AdaHAT
  • 🎓 CV
    • CV (English)
    • CV (Mandarin)
  • About
  1. Continual Learning (CL)
  2. Configure CL Main Experiment
  3. CL Dataset
  • Welcome to CLArena
  • Get Started
  • Continual Learning (CL)
    • Configure CL Main Experiment
      • Experiment Index Config
      • CL Algorithm
      • CL Dataset
      • Backbone Network
      • Optimizer
      • Learning Rate Scheduler
      • Trainer
      • Metrics
      • Lightning Loggers
      • Callbacks
      • Other Configs
    • Save and Evaluate Model
    • Full Experiment
    • Output Results
  • Continual Unlearning (CUL)
    • Configure CUL Main Experiment
      • Experiment Index Config
      • Unlearning Algorithm
      • Callbacks
    • Full Experiment
    • Output Results
  • Multi-Task Learning (MTL)
    • Configure MTL Experiment
      • Experiment Index Config
      • MTL Algorithm
      • MTL Dataset
      • Backbone Network
      • Optimizer
      • Learning Rate Scheduler
      • Trainer
      • Metrics
      • Callbacks
    • Save and Evaluate Model
    • Output Results
  • Single-Task Learning (STL)
    • Configure STL Experiment
      • Experiment Index Config
      • STL Dataset
      • Backbone Network
      • Optimizer
      • Learning Rate Scheduler
      • Trainer
      • Metrics
      • Callbacks
    • Save and Evaluate Model
    • Output Results
  • Implement Your Modules (TBC)
  • API Reference
  1. Continual Learning (CL)
  2. Configure CL Main Experiment
  3. CL Dataset

Configure CL Dataset (CL Main)

Modified

August 16, 2025

The continual learning dataset is a sequence of datasets corresponding to continual learning tasks, each of which has its own training and test data. If you are not familiar with continual learning datasets, feel free to gain some knowledge from my continual learning beginners’ guide about CL datasets.

The CL dataset is a sub-config under the experiment index config (CL Main). To configure a custom CL dataset, you need to create a YAML file in the cl_dataset/ folder. Below is an example of the CL dataset config.

Example

configs
├── __init__.py
├── entrance.yaml
├── experiment
│   ├── example_clmain_train.yaml
│   └── ...
├── cl_dataset
│   └── permuted_mnist.yaml
...
configs/experiment/example_clmain_train.yaml
defaults:
  ...
    - /cl_dataset: permuted_mnist.yaml
  ...
configs/cl_dataset/permuted_mnist.yaml
_target_: clarena.cl_datasets.PermutedMNIST
root: data/MNIST
num_tasks: 10
validation_percentage: 0.1
batch_size: 128
permutation_mode: first_channel_only

Supported CL Datasets & Required Config Fields

In CLArena, we have implemented many CL datasets as Python classes in the clarena.cl_datasets module that you can use for your experiments.

To choose a CL dataset, assign the _target_ field to the class name of the CL dataset. For example, to use the Permuted MNIST dataset, set the _target_ field to clarena.cl_datasets.PermutedMNIST. Each CL dataset has its own hyperparameters and configurations, which means it has its own required fields. The required fields are the same as the arguments of the class specified by _target_. The arguments for each CL dataset class can be found in the API documentation.

API Reference (CL Datasets) Source Code (CL Datasets)

Below is the full list of supported CL datasets. We only support image classification datasets. The CL datasets can be constructed from regular datasets in three main ways: permute, split, combine, so we divide them into three categories: Permuted, Split, and Combined. Please refer to my continual learning beginners’ guide about the three types of datasets. Note that the “Permuted CL Dataset”, “Split CL Dataset”, “Combined CL Dataset”, and “Other CL Dataset” are exactly the class names that the _target_ field is assigned to.

For more information about the original datasets that these CL datasets are constructed from, please refer to my article: A Summary of Vision Datasets for Image Classification.

Permuted CL Datasets

Permuted CL Dataset Description Required Config Fields
PermutedArabicHandwrittenDigits Permuted Arabic Handwritten Digits dataset. The Arabic Handwritten Digits Dataset (AHDD) is a collection of handwritten Arabic digits (0-9). It consists of 60,000 training and 10,000 test images of handwritten Arabic digits (10 classes), each 28x28 grayscale image (similar to MNIST). Same as PermutedArabicHandwrittenDigits class arguments
PermutedCaltech101 Permuted Caltech 101 dataset. The Caltech 101 dataset is a collection of pictures of objects. It consists of 9,146 images of 101 classes, each color image. Same as PermutedCaltech101 class arguments
PermutedCaltech256 Permuted Caltech 256 dataset. The Caltech 256 dataset is a collection of pictures of objects. It consists of 30,607 images of 256 classes, each color image. Same as PermutedCaltech256 class arguments
PermutedCelebA

Permuted CelebA dataset. The CelebFaces Attributes Dataset (CelebA) is a large-scale celebrity faces dataset. It consists of 202,599 face images of 10,177 celebrity identities (classes), each 178x218 color image.

Note that the original CelebA dataset is not a classification dataset but a attributes dataset. We only use the identity of each face as the class label for classification.

Same as PermutedCelebA class arguments
PermutedCIFAR10 Permuted CIFAR-10 dataset. The CIFAR-10 dataset is a subset of the 80 million tiny images dataset. It consists of 50,000 training and 10,000 test images of 10 classes, each 32x32 color image. Same as PermutedCIFAR10 class arguments
PermutedCIFAR100 Permuted CIFAR-100 dataset. The CIFAR-100 dataset is a subset of the 80 million tiny images dataset. It consists of 50,000 training and 10,000 test images of 100 classes, each 32x32 color image. Same as PermutedCIFAR100 class arguments
PermutedCountry211 Permuted Country211 dataset. The Country211 dataset is a collection of geolocation pictures of different countries. It consists of 62,200 images of 211 countries (classes), each 256x256 color image. Same as PermutedCountry211 class arguments
PermutedCUB2002011 Permuted CUB-200-2011 dataset. The CUB (Caltech-UCSD Birds)-200-2011) is a bird image dataset. It consists of 11,788 images of 200 bird species (classes), each 64x64 color image. Same as PermutedCUB2002011 class arguments
PermutedDTD Permuted DTD dataset. The Describable Textures Dataset (DTD) is a collection of describable texture pictures. It consists of 5,640 images of 47 kinds of textures (classes), each 300x300-640x640 color image. Same as PermutedDTD class arguments
PermutedEMNIST

Permuted EMNIST dataset. The EMNIST dataset is a collection of handwritten letters and digits (including A-Z, a-z, 0-9). It consists of 814,255 images in 62 classes, each 28x28 grayscale image.

EMNIST has 6 different splits: byclass, bymerge, balanced, letters, digits and mnist, each containing a different subset of the original collection. We support all of them in Permuted EMNIST.

Same as PermutedEMNIST class arguments
PermutedEuroSAT Permuted EuroSAT dataset. The EuroSAT dataset is a collection of satellite images of lands. It consists of 27,000 images of 10 classes, each 64x64 color image. Same as PermutedEuroSAT class arguments
PermutedFaceScrub

Permuted FaceScrub dataset. The original FaceScrub dataset is a collection of human face images. It consists 106,863 images of 530 people (classes), each high resolution color image.

To make it simple, this version uses subset of the official Megaface FaceScrub challenge, cropped and resized to 32x32. We have FaceScrub-10, FaceScrub-20, FaceScrub-50, FaceScrub-100 datasets where the number of classes are 10, 20, 50 and 100 respectively.

Same as PermutedFaceScrub class arguments
PermutedFashionMNIST Permuted Fashion-MNIST dataset. The Fashion-MNIST dataset is a collection of fashion images. It consists of 60,000 training and 10,000 test images of 10 types of clothing (classes), each 28x28 grayscale image (similar to MNIST). Same as PermutedFashionMNIST class arguments
PermutedFER2013 Permuted FER2013 dataset. The FER2013 dataset is a collection of facial expression images. It consists of 35,887 images of 7 facial expressions (classes), each 48x48 grayscale image. Same as PermutedFER2013 class arguments
PermutedFGVCAircraft

Permuted FGVC-Aircraft dataset. The FGVC-Aircraft dataset is a collection of aircraft images. It consists of 10,200 images, each color image.

FGVC-Aircraft has 3 different class labels by variant, family and manufacturer, which has 102, 70, 41 classes respectively. We support all of them in Permuted FGVC-Aircraft.

Same as PermutedFGVCAircraft class arguments
PermutedFlowers102 Permuted Oxford 102 Flower dataset. The Oxford 102 Flower dataset is a collection of flower pictures. It consists of 8,189 images of 102 kinds of flowers (classes), each color image. Same as PermutedFlowers102 class arguments
PermutedFood101 Permuted Food-101 dataset. The Food-101 dataset is a collection of food images. It consists of 101,000 images of 101 classes, each color image. Same as PermutedFood101 class arguments
PermutedGTSRB Permuted GTSRB dataset. The GTSRB dataset is a collection of traffic sign images. It consists of 51,839 images of 43 different traffic signs (classes), each color image. Same as PermutedGTSRB class arguments
PermutedImagenette Permuted Imagenette dataset. The Imagenette dataset is a subset of 10 easily classified classes from Imagenet. Permuted Linnaeus 5 dataset. The Linnaeus 5 dataset is a collection of flower images. It consists of 8,000 images of 5 flower species (classes). It provides 256x256, 128x128, 64x64, and 32x32 color images. We support all of them in Permuted Linnaeus 5. We support all of them in Permuted Imagenette. Same as PermutedImagenette class arguments
PermutedKannadaMNIST Permuted Kannada-MNIST dataset. The Kannada-MNIST dataset is a collection of handwritten Kannada digits (0-9). It consists of 60,000 training and 10,000 test images of handwritten Kannada digits (10 classes), each 28x28 grayscale image (similar to MNIST). Same as PermutedKannadaMNIST class arguments
PermutedKMNIST Permuted Kuzushiji-MNIST dataset. The Kuzushiji-MNIST dataset is a collection of Japanese Kuzushiji character images. It consists of 60,000 training and 10,000 test images of Japanese Kuzushiji images (10 classes), each 28x28 grayscale image (similar to MNIST). Same as PermutedKMNIST class arguments
PermutedLinnaeus5 Permuted Linnaeus 5 dataset. The Linnaeus 5 dataset is a collection of flower images. It consists of 8,000 images of 5 flower species (classes). It provides 256x256, 128x128, 64x64, and 32x32 color images. We support all of them in Permuted Linnaeus 5. Same as PermutedLinnaeus5 class arguments
PermutedMNIST Permuted MNIST dataset. The MNIST dataset is a collection of handwritten digits. It consists of 60,000 training and 10,000 test images of handwritten digit images (10 classes), each 28x28 grayscale image. Same as PermutedMNIST class arguments
PermutedNotMNIST Permuted NotMNIST dataset. The NotMNIST dataset is a collection of letters (A-J). Permuted MNIST dataset. This version uses the smaller set, which consists of about 19,000 images of 10 classes, each 28x28 grayscale image. Same as PermutedNotMNIST class arguments
PermutedOxfordIIITPet Permuted Oxford-IIIT Pet dataset. The Oxford-IIIT Pet dataset is a collection of cat and dog pictures. It consists of 7,349 images of 37 breeds (classes), each color image. It also provides a binary classification version with 2 classes (cat or dog). We support both versions in Permuted Oxford-IIIT Pet. Same as PermutedOxfordIIITPet class arguments
PermutedPCAM Permuted PCAM dataset. The PCAM dataset is a collection of medical images of breast cancer. It consists of 327,680 images in 2 classes (benign and malignant), each 96x96 color image. Same as PermutedPCAM class arguments
PermutedRenderedSST2 Permuted Rendered SST2 dataset. The Rendered SST2 dataset is a collection of optical character recognition images. It consists of 9,613 images in 2 classes (positive and negative sentiment), each 448x448 color image. Same as PermutedRenderedSST2 class arguments
PermutedSEMEION Permuted SEMEION dataset. The SEMEION dataset is a collection of handwritten digits. It consists of 1,593 handwritten digit images (10 classes), each 16x16 grayscale image. Same as PermutedSEMEION class arguments
PermutedSignLanguageMNIST Permuted Sign Language MNIST dataset. The Sign Language MNIST dataset is a collection of hand gesture images representing ASL letters (A-Y, excluding J). It consists of 34,627 images of 24 classes, each 28x28 grayscale image. Same as PermutedSignLanguageMNIST class arguments

PermutedStanfordCars

(download link expired)

Permuted Stanford Cars dataset. The Stanford Cars dataset is a collection of car images. It consists of 16,185 images in 196 classes, each color image. Same as PermutedStanfordCars class arguments
PermutedSUN397 Permuted SUN397 dataset. The SUN397 dataset is a collection of scene images. It consists of 108,754 images of 397 classes, each color image. Same as PermutedSUN397 class arguments
PermutedSVHN Permuted SVHN dataset. The SVHN dataset is a collection of street view house number images. It consists 73,257 training and 26,032 test images of 10 classes, each 32x32 color image. Same as PermutedSVHN class arguments
PermutedTinyImageNet Permuted TinyImageNet dataset. The TinyImageNet dataset is smaller, more manageable version of the Imagenet dataset. It consists of 100,000 training, 10,000 validation and 10,000 test images of 200 classes, each 64x64 color image. Same as PermutedTinyImageNet class arguments
PermutedUSPS Permuted USPS dataset. The USPS dataset is a collection of handwritten digits. It consists of 9,298 handwritten digit images (10 classes), each 16x16 grayscale image. Same as PermutedUSPS class arguments

Split CL Datasets

Split CL Dataset Description Required Config Fields
SplitCIFAR10 Split CIFAR-10 dataset. The CIFAR-10 dataset is a subset of the 80 million tiny images dataset. It consists of 50,000 training and 10,000 test images of 10 classes, each 32x32 color image. Same as SplitCIFAR10 class arguments
SplitCIFAR100 Split CIFAR-100 dataset. The CIFAR-100 dataset is a subset of the 80 million tiny images dataset. It consists of 50,000 training and 10,000 test images of 100 classes, each 32x32 color image. Same as SplitCIFAR100 class arguments
SplitCUB2002011 Split CUB-200-2011 dataset. The CUB (Caltech-UCSD Birds)-200-2011) is a bird image dataset. It consists of 100,000 training, 10,000 validation, 10,000 test images of 200 bird species (classes), each 64x64 color image. Same as SplitCUB2002011 class arguments
SplitMNIST Split MNIST dataset. The MNIST dataset is a collection of handwritten digits. It consists of 60,000 training and 10,000 test images of handwritten digit images (10 classes), each 28x28 grayscale image. Same as SplitMNIST class arguments
SplitTinyImageNet Split TinyImageNet dataset. The TinyImageNet dataset is smaller, more manageable version of the Imagenet dataset. It consists of 100,000 training, 10,000 validation and 10,000 test images of 200 classes, each 64x64 color image. Same as SplitTinyImageNet class arguments

Combined CL Datasets

Combined CL Dataset Description Required Config Fields
Combined Combined CL dataset. We currently support: CIFAR-10, CIFAR-100, MNIST, SVHN, Fashion-MNIST, TrafficSigns, FaceScrub, NotMNIST, EMNIST Digits, EMNIST Letters, Arabic Handwritten Digits, Kannada-MNIST, Sign Language MNIST, Kuzushiji-MNIST, Food-101, Linnaeus 5, Caltech 101, EuroSAT, DTD, Country 211 Same as Combined class arguments
Back to top
CL Algorithm
Backbone Network
 
 

©️ 2025 Pengxiang Wang. All rights reserved.