A Summary of Vision Datasets for Continual Learning Classification

I am currently researching on continual learning, a paradigm of machine learning. The datasets for continual learning consist of a sequence of tasks, where each task is to train and test a dataset. Many existing datasets can be used to construct a sequence of tasks. This post explores datasets that are suitable for constructing continual learning tasks, and provides a summary of the most commonly used datasets in the field.

There are 3 ways to construct a sequence of tasks for continual learning (please refer to my continual learning beginners guide for more details):

Combine: each task uses a different dataset from different sources.
Permute: permute the pixels of a dataset using different permutation seeds to create different tasks.
Split: split a dataset by class into different tasks.

This field focuses on the paradigm itself rather than application scenarios, so take the simplest scenario – supervised image classification as the playground by default. We are going to look at the datasets designed for image classification, or that have class label information that can used for image classification.

Dataset	Number of samples	Number of classes	Ave per class	Image Size	Permute	Split
MNIST	70,000	10	7,000	1x28x28	y	MLP, 98
Fashion-MNIST	70,000	10	7,000	1x28x28	y	MLP, 86
Kuzushiji-MNIST	70,000	10	7,000	1x28x28	y	MLP, 87
EMNIST ByClass	814,255	62	~13000	1x28x28	y	MLP, 84
EMNIST ByMerge	814,255	47	~17300	1x28x28	y	MLP, 87
EMNIST Balanced	131,600	47	2800	1x28x28	y	MLP, 82
EMNIST Letters	145,600	26	5600	1x28x28	y	MLP, 90
EMNIST Digits	280,000	10	28000	1x28x28	y	MLP, 98
QMNIST	120,000	10	12,000	1x28x28
notMNIST	Small ~19,000, Large ~500,000	10	~1,900, ~50,000	1x28x28	y	MLP, 95
Sign Language MNIST	34,627	24	1,148	1x28x28	y	MLP, 60;
Arabic Handwritten Digits	70,000	10	7,000	1x28x28	y	MLP, 97
Kannada-MNIST	70,000	10	7,000	1x28x28	y	MLP, 93
CIFAR-10	60,000	10	6,000	3x32x32	y	ResNet18, 50
CIFAR-100	60,000	100	600	3x32x32	y	ResNet18, 47
GTSRB	51,839	43	1,205	Coloured, not aligned	y	ResNet18, 80
SVHN	99,289 (without extra)	10	9,929	3x32x32	y	ResNet18, 90
Linnaeus 5	8,000	5	1,600	3x256x256 / 3x128x128 / 3x64x64 / 3x32x32	y	ResNet18, 54
TinyImageNet	120,000	200	600	3x64x64	y	ResNet18, 70 (30 epochs)
MedMNIST2D PathMNIST	107,180	9	11,000	1x28x28
MedMNIST2D ChestMNIST	112,120	2	56,060	1x28x28
MedMNIST2D DermaMNIST	10,015	7	1,430	1x28x28
MedMNIST2D OCTMNIST	109,309	4	27,327	1x28x28
MedMNIST2D PneumoniaMNIST	5,856	2	2,928	1x28x28
MedMNIST2D BreastMNIST	780	2	390	1x28x28
MedMNIST2D BloodMNIST	17,092	8	2,136	1x28x28
MedMNIST2D TissueMNIST	236,386	8	29,548	1x28x28
MedMNIST2D OrganAMNIST	58,830	11	5,348	1x28x28
MedMNIST2D OrganCMNIST	23,583	11	2,144	1x28x28
MedMNIST2D OrganSMNIST	25,211	11	2,283	1x28x28
Omniglot	32,460	1,623	20	1x105x105
FaceScrub	106,863	530	202	Coloured, not aligned	?