Configure CL Algorithm
The continual learning algorithm is the core part of continual learning, determining how sequential tasks are learned and managing interactions between previous and new tasks. If you are not familiar with continual learning algorithms, feel free to get some knowledge from my CL beginnerβs guide about the baseline algorithms and CL methodology.
Configure CL Algorithm
To configure the continual learning dataset for your experiment, link the /cl_algorithm
field in the experiment index config to a YAML file in cl_algorithm/ subfolder of your configs. That YAML file should use _target_
field to link to a CL dataset class (as shown in source code: clarena/cl_algorithms/) and specify its arguments in the following field (except backbone
and heads
). Here is an example:
./clarena/example_configs
βββ __init__.py
βββ entrance.yaml
βββ experiment
β βββ example.yaml
β βββ ...
βββ cl_algorithm
β βββ finetuning.yaml
...
example_configs/experiment/example.yaml
defaults:
...
- /cl_algorithm: finetuning.yaml
...
example_configs/cl_algorithm/finetuning.yaml
_target_: clarena.cl_algorithms.Finetuning
Supported CL Algorithm List
In this package we implemented many CL algorithm classes in clarena.cl_algorithms
module that you can use for your experiment. Below is the full list. Please refer to the API reference of each class to learn its required arguments.
CL Algorithm | Description |
---|---|
Finetuning (SGD) | The most naive way for task-incremental learning. It simply initialises the backbone from the last task when training new task. Check out my CL beginnersβ guide for details. |
Fix | Another naive way for task-incremental learning aside from Finetuning. It serves as kind of toy algorithm when discussing stability-plasticity dilemma in continual learning. It simply fixes the backbone forever after training first task. Check out my CL beginnersβ guide for details. |
LwF [paper] | LwF (Learning without Forgetting, 2017) is a regularisation-based continual learning approach that constrains the feature output of the model to be similar to that of the previous tasks. From the perspective of knowledge distillation, it distills previous tasks models into the training process for new task in the regularisation term. It is a simple yet effective method for continual learning. |
EWC [paper] | EWC (Elastic Weight Consolidation, 2017) is a regularisation-based continual learning approach that calculates parameter importance for the previous tasks and penalises the current task loss with the importance of the parameters. |
HAT [paper][code] | HAT (Hard Attention to the Task, 2018) is an architecture-based continual learning approach that uses learnable hard attention masks to select the task-specific parameters. |
AdaHAT [paper][code] | AdaHAT (Adaptive Hard Attention to the Task, 2024) is an architecture-based continual learning approach that improves HAT by introducing new adaptive soft gradient clipping based on parameter importance and network sparsity. This is my work, check out Paper: AdaHAT for details. |
CBP [paper] [code] | CBP (Continual Backpropagation, 2024) is a continual learning approach that reinitialises a small number of units during training, using an utility measures to determine which units to reinitialise. It aims to address loss of plasticity problem for learning new tasks, yet not very well solve the catastrophic forgetting problem in continual learning. |
Make sure the algorithm is compatible with the CL dataset, backbone and paradigm.