Configure CL Algorithm (CL Main)
Continual learning algorithm is the core part of continual learning, determining how sequential tasks are learned and managing interactions between previous and new tasks. If you are not familiar with continual learning algorithms, feel free to gain some knowledge from my continual learning beginners’ guide about the baseline algorithms and CL methodology.
CL algorithm is a sub-config under the experiment index config (CL Main). To configure a custom CL algorithm, you need to create a YAML file in cl_algorithm/
folder. Below shows an example of the CL algorithm config.
Example
configs
├── __init__.py
├── entrance.yaml
├── experiment
│ ├── example_clmain_train.yaml
│ └── ...
├── cl_algorithm
│ └── finetuning.yaml
...
configs/experiment/example_clmain_train.yaml
defaults:
...
- /cl_algorithm: finetuning.yaml
...
configs/cl_algorithm/finetuning.yaml
_target_: clarena.cl_algorithms.Finetuning
Supported CL Algorithms & Required Config Fields
In CLArena, we have implemented many CL algorithms as Python classes in clarena.cl_algorithms
module that you can use for your experiments.
To choose a CL algorithm, assign the _target_
field to the class name of the CL algorithm. For example, to use the Finetuning
algorithm, set the _target_
field to clarena.cl_algorithms.Finetuning
. Each CL algorithm has its own hyperparameters and configurations, which means it has its own required fields. The required fields are the same as the arguments of the class specified by _target_
(excluding backbone
and heads
). The arguments of each CL algorithm class can be found in the API documentation.
Below is the full list of supported CL algorithms. Note that the “CL Algorithm” is exactly the class name that the _target_
field is assigned to.
CL Algorithm | Description | Required Config Fields |
---|---|---|
Finetuning | The most naive way for task-incremental learning. It simply initializes the backbone from the last task when training new task. (Please refer to my continual learning beginners’ guide. ) | Same as Finetuning class arguments (excluding backbone and heads ) |
Fix | Another naive way for task-incremental learning aside from Finetuning. It simply fixes the backbone forever after training first task. It serves as kind of toy algorithm when discussing stability-plasticity dilemma in continual learning. (Please refer to my continual learning beginners’ guide. ) | Same as Fix class arguments (excluding backbone and heads ) |
Independent | Another naive way for task-incremental learning aside from Finetuning. It assigns a new independent model for each task. This is a simple way to avoid catastrophic forgetting at the extreme cost of memory. It achieves the theoretical upper bound of performance in continual learning. (Please refer to my continual learning beginners’ guide. ) | Same as Independent class arguments (excluding backbone and heads ) |
Random | Pass the training step and simply use the randomly initialized model to predict the test data. This serves as a reference model to compute forgetting rate. See chapter 4 in HAT (Hard Attention to the Task) paper. | Same as Random class arguments (excluding backbone and heads ) |
LwF |
A regularization-based continual learning approach that constrains the feature output of the model to be similar to that of the previous tasks. From the perspective of knowledge distillation, it distills previous tasks models into the training process for new task in the regularization term. It is a simple yet effective method for continual learning. | Same as LwF class arguments (excluding backbone and heads ) |
EWC |
A regularization-based approach that calculates the fisher information as parameter importance for the previous tasks and penalizes the current task loss with the importance of the parameters. | Same as EWC class arguments (excluding backbone and heads ) |
HAT |
An architecture-based approach that uses learnable hard attention masks to select task-specific parameters. | Same as HAT class arguments (excluding backbone and heads ) |
AdaHAT |
An architecture-based approach that improves HAT by introducing adaptive soft gradient clipping based on parameter importance and network sparsity. (This is my work, please go to Paper: AdaHAT for details. ) | Same as AdaHAT class arguments (excluding backbone and heads ) |
FGAdaHAT |
An architecture-based approach that improves AdaHAT by fine-grained neuron-wise importance measures guiding the adaptive adjustment mechanism in AdaHAT. (This is my work, please go to Paper: FG-AdaHAT for details. ) | Same as FGAdaHAT class arguments (excluding backbone and heads ) |
CBP [@dohare2024loss] (bug exists) |
A continual learning approach that reinitializes a small number of units during training, using an utility measures to determine which units to reinitialize. It aims to address loss of plasticity problem for learning new tasks, yet not very well solve the catastrophic forgetting problem in continual learning. | Same as CBP class arguments (excluding backbone and heads ) |
WSN |
An architecture-based approach that trains learnable parameter-wise score and select the most scored $c\%$ of the network parameters to be used for each task. | Same as WSN class arguments (excluding backbone and heads ) |
NISPA (bug exists) |
An architecture-based approach that selects neurons and weights through manual rules. | Same as NISPA class arguments (excluding backbone and heads ) |
AmnesiacHAT (bug exists) |
A variant of HAT enabling HAT with unlearning ability, based on the AdaHAT algorithm. | Same as AmnesiacHAT class arguments (excluding backbone and heads ) |
Make sure that the algorithm is compatible with the CL dataset, backbone and paradigm. For example, HAT, AdaHAT, FG-AdaHAT works on HAT mask backbones only.