Shawn’s Blog
  • 🗂️ Collections
    • 🖥️ Slides Gallery
    • 🧠 Q&A Knowledge Base
    • 💻 LeetCode Notes
    • 🧑‍🍳️ Cooking Ideas
    • 🍱 Cookbook
    • 💬 Language Learning
    • 🎼 Music Library
  • ⚙️ Projects
    • ⚛ Continual Learning Arena
  • 📄 Papers
    • AdaHAT
    • FG-AdaHAT
    • AmnesiacHAT
  • 🎓 CV
    • CV (English)
    • CV (Mandarin)
  • About
  1. Components
  2. CL Algorithm
  • Welcome to CLArena
  • Getting Started
  • Configure Pipelines
  • Continual Learning (CL)
    • CL Main Experiment
    • Save and Evaluate Model
    • Full Experiment
    • Output Results
  • Continual Unlearning (CUL)
    • CUL Main Experiment
    • Full Experiment
    • Output Results
  • Multi-Task Learning (MTL)
    • MTL Experiment
    • Save and Evaluate Model
    • Output Results
  • Single-Task Learning (STL)
    • STL Experiment
    • Save and Evaluate Model
    • Output Results
  • Components
    • CL Dataset
    • MTL Dataset
    • STL Dataset
    • CL Algorithm
    • CUL Algorithm
    • MTL Algorithm
    • STL Algorithm
    • Backbone Network
    • Optimizer
    • Learning Rate Scheduler
    • Trainer
    • Metrics
    • Lightning Loggers
    • Callbacks
    • Other Configs
  • Custom Implementation
    • CL Dataset
    • MTL Dataset
    • STL Dataset
    • CL Algorithm
    • CUL Algorithm
    • MTL Algorithm
    • STL Algorithm
    • Backbone Network
    • Callback
  • API Reference
  • FAQs
  1. Components
  2. CL Algorithm

Configure CL Algorithm

The continual learning algorithm is the core of continual learning. It determines how sequential tasks are learned and manages the interaction between previous and new tasks. If you are not familiar with continual learning algorithms, feel free to learn more from my beginners’ guide: baseline algorithms and CL methodology.

The CL algorithm is a sub-config under the index config of:

  • Continual learning main experiment and evaluation
  • Continual learning full experiment and the reference experiments
  • Continual unlearning main experiment and evaluation
  • Continual unlearning full experiment and the reference experiments

To configure a custom CL algorithm, create a YAML file in the cl_algorithm/ folder. Below is an example of the CL algorithm config.

Example

configs
├── __init__.py
├── entrance.yaml
├── index
│   ├── example_cl_main_expr.yaml
│   └── ...
├── cl_algorithm
│   └── finetuning.yaml
...
example_configs/index/example_cl_main_expr.yaml
defaults:
  ...
  - /cl_algorithm: finetuning.yaml
  ...
example_configs/cl_algorithm/finetuning.yaml
_target_: clarena.cl_algorithms.Finetuning

Supported CL Algorithms & Required Config Fields

In CLArena, we have implemented many CL algorithms as Python classes in the clarena.cl_algorithms module that you can use for your experiments.

To choose a CL algorithm, assign the _target_ field to the class name of the algorithm. For example, to use Finetuning, set the _target_ field to clarena.cl_algorithms.Finetuning. Each CL algorithm has its own hyperparameters and configurations, which means it has its own required fields. The required fields are the same as the arguments of the class specified by _target_ (excluding backbone, heads and non_algorithmic_params. The arguments for each CL algorithm class can be found in the API documentation.

API Reference (CL Algorithms) Source Code (CL Algorithms)

Below is the full list of supported CL algorithms. Note that the names in the “CL Algorithm” column are the exact class names that you should assign to _target_.

CL Algorithm Description Required Config Fields
Finetuning The most naive way for task-incremental learning. It simply initializes the backbone from the last task when training new task. See my continual learning beginners’ guide Same as Finetuning class arguments (excluding backbone, heads and non_algorithmic_params)
Fix Another naive way for task-incremental learning aside from Finetuning. It simply fixes the backbone forever after training first task. It serves as kind of toy algorithm when discussing stability-plasticity dilemma in continual learning. See my continual learning beginners’ guide Same as Fix class arguments (excluding backbone, heads and non_algorithmic_params)
Independent Another naive way for task-incremental learning aside from Finetuning. It assigns a new independent model for each task. This is a simple way to avoid catastrophic forgetting at the extreme cost of memory. It achieves the theoretical upper bound of performance in continual learning. See my continual learning beginners’ guide Same as Independent class arguments (excluding backbone, heads and non_algorithmic_params)
Random Pass the training step and simply use the randomly initialized model to predict the test data. This serves as a reference model to compute forgetting rate. See chapter 4 in HAT (Hard Attention to the Task) paper Same as Random class arguments (excluding backbone, heads and non_algorithmic_params)

LwF

[paper]

(Li and Hoiem 2017)

A regularization-based continual learning approach that constrains the feature output of the model to be similar to that of the previous tasks. From the perspective of knowledge distillation, it distills previous tasks models into the training process for new task in the regularization term. It is a simple yet effective method for continual learning Same as LwF class arguments (excluding backbone, heads and non_algorithmic_params)

EWC

[paper]

(Kirkpatrick et al. 2017)

A regularization-based approach that calculates the fisher information as parameter importance for the previous tasks and penalizes the current task loss with the importance of the parameters Same as EWC class arguments (excluding backbone, heads and non_algorithmic_params)

HAT

[paper]

(Serra et al. 2018)

An architecture-based approach that uses learnable hard attention masks to select task-specific parameters Same as HAT class arguments (excluding backbone, heads and non_algorithmic_params)

AdaHAT

[paper]

(Wang et al. 2024)

An architecture-based approach that improves HAT by introducing adaptive soft gradient clipping based on parameter importance and network sparsity. (This is my work, see Paper: AdaHAT for details) Same as AdaHAT class arguments (excluding backbone, heads and non_algorithmic_params)

FGAdaHAT

[code]

An architecture-based approach that improves AdaHAT by fine-grained neuron-wise importance measures guiding the adaptive adjustment mechanism in AdaHAT. (This is my work, see Paper: FG-AdaHAT for details) Same as FGAdaHAT class arguments (excluding backbone, heads and non_algorithmic_params)

CBP

[paper] [code]

(Dohare et al. 2024)

(bug exists)

A continual learning approach that reinitializes a small number of units during training, using an utility measures to determine which units to reinitialize. It aims to address loss of plasticity problem for learning new tasks, yet not very well solve the catastrophic forgetting problem in continual learning Same as CBP class arguments (excluding backbone, heads and non_algorithmic_params)

WSN

[paper] [code]

(Kang et al. 2022)

An architecture-based approach that trains learnable parameter-wise score and select the most scored $c\%$ of the network parameters to be used for each task Same as WSN class arguments (excluding backbone, heads and non_algorithmic_params)

NISPA

[paper] [code]

(Gurbuz and Dovrolis 2022)

(bug exists)

An architecture-based approach that selects neurons and weights through manual rules Same as NISPA class arguments (excluding backbone, heads and non_algorithmic_params)

AmnesiacHAT

(bug exists)

A variant of HAT enabling HAT with unlearning ability, based on the AdaHAT algorithm Same as AmnesiacHAT class arguments (excluding backbone, heads and non_algorithmic_params)
Warning

Make sure that the algorithm is compatible with the CL dataset, backbone and paradigm. For example, HAT, AdaHAT, FG-AdaHAT works on HAT mask backbones only.

Back to top

References

Dohare, Shibhansh, J Fernando Hernandez-Garcia, Qingfeng Lan, Parash Rahman, A Rupam Mahmood, and Richard S Sutton. 2024. “Loss of Plasticity in Deep Continual Learning.” Nature 632 (8026): 768–74.
Gurbuz, Mustafa Burak, and Constantine Dovrolis. 2022. “Nispa: Neuro-Inspired Stability-Plasticity Adaptation for Continual Learning in Sparse Networks.” arXiv Preprint arXiv:2206.09117.
Kang, Haeyong, Rusty John Lloyd Mina, Sultan Rizky Hikmawan Madjid, Jaehong Yoon, Mark Hasegawa-Johnson, Sung Ju Hwang, and Chang D Yoo. 2022. “Forget-Free Continual Learning with Winning Subnetworks.” In International Conference on Machine Learning, 10734–50. PMLR.
Kirkpatrick, James, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A Rusu, Kieran Milan, et al. 2017. “Overcoming Catastrophic Forgetting in Neural Networks.” Proceedings of the National Academy of Sciences 114 (13): 3521–26.
Li, Zhizhong, and Derek Hoiem. 2017. “Learning Without Forgetting.” IEEE Transactions on Pattern Analysis and Machine Intelligence 40 (12): 2935–47.
Serra, Joan, Didac Suris, Marius Miron, and Alexandros Karatzoglou. 2018. “Overcoming Catastrophic Forgetting with Hard Attention to the Task.” In International Conference on Machine Learning, 4548–57. PMLR.
Wang, Pengxiang, Hongbo Bo, Jun Hong, Weiru Liu, and Kedian Mu. 2024. “AdaHAT: Adaptive Hard Attention to the Task in Task-Incremental Learning.” In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, 143–60. Springer.
STL Dataset
CUL Algorithm
 
 

©️ 2026 Pengxiang Wang. All rights reserved.