Configure Optimizer(s) (CL Main)

Modified

October 9, 2025

Optimizer is a component that manages the learning process by updating model parameters based on the computed gradients.

Optimizer is a sub-config under the index config of:

Continual learning main experiment
Continual learning full experiment and the reference experiments
Continual unlearning main experiment
Continual unlearning full experiment and the reference experiments
Multi-task learning experiment
Single-task learning experiment

To configure a custom optimizer, create a YAML file in the optimizer/ folder. As continual learning involves multiple tasks, each task can be assigned an optimizer for training. We support a uniform optimizer for all tasks and distinct optimizers for each task. For multi-task and single-task learning, there is only one uniform scheduler in the experiment. Below are examples of both configurations.

Example

configs
├── __init__.py
├── entrance.yaml
├── index
│   ├── example_cl_main_expr.yaml
│   └── ...
├── optimizer
│   ├── adam_10_tasks.yaml
│   └── adam.yaml
...

Uniform Optimizer

example_configs/index/example_cl_main_expr.yaml

defaults:
  ...
  - /optimizer: adam.yaml
  ...

example_configs/optimizer/adam.yaml

_target_: torch.optim.Adam
_partial_: true # partially instantiate optimizer without 'params' argument. Make sure this is included in any case!
lr: 0.01
weight_decay: 0.0

Distinct Optimizer for Each Task (Continual Learning)

Distinct optimizers are specified as a dictionary. The key and length of the dictionary must match the train_tasks field in the index config of continual learning. Below is an example of distinct optimizers for 10 tasks.

defaults:
  ...
  - /optimizer: adam_10_tasks.yaml
  ...

example_configs/optimizer/sgd_10_tasks.yaml

1:
  _target_: torch.optim.Adam
  _partial_: True # partially instantiate optimizer without 'params' argument. Make sure this is included in any case!
  lr: 0.01
  weight_decay: 0.0
2:
  _target_: torch.optim.Adam
  _partial_: True # partially instantiate optimizer without 'params' argument. Make sure this is included in any case!
  lr: 0.01
  weight_decay: 0.0
3:
  _target_: torch.optim.Adam
  _partial_: True # partially instantiate optimizer without 'params' argument. Make sure this is included in any case!
  lr: 0.01
  weight_decay: 0.0
4:
  _target_: torch.optim.Adam
  _partial_: True # partially instantiate optimizer without 'params' argument. Make sure this is included in any case!
  lr: 0.01
  weight_decay: 0.0
5:
  _target_: torch.optim.Adam
  _partial_: True # partially instantiate optimizer without 'params' argument. Make sure this is included in any case!
  lr: 0.01
  weight_decay: 0.0
6:
  _target_: torch.optim.Adam
  _partial_: True # partially instantiate optimizer without 'params' argument. Make sure this is included in any case!
  lr: 0.01
  weight_decay: 0.0
7:
  _target_: torch.optim.Adam
  _partial_: True # partially instantiate optimizer without 'params' argument. Make sure this is included in any case!
  lr: 0.01
  weight_decay: 0.0
8:
  _target_: torch.optim.Adam
  _partial_: True # partially instantiate optimizer without 'params' argument. Make sure this is included in any case!
  lr: 0.01
  weight_decay: 0.0
9:
  _target_: torch.optim.Adam
  _partial_: True # partially instantiate optimizer without 'params' argument. Make sure this is included in any case!
  lr: 0.01
  weight_decay: 0.0
10:
  _target_: torch.optim.Adam
  _partial_: True # partially instantiate optimizer without 'params' argument. Make sure this is included in any case!
  lr: 0.01
  weight_decay: 0.0

Supported Optimizers & Required Config Fields

In CLArena, we do not implement custom optimizers. We use the built-in optimizers from PyTorch in the torch.optim module.

Note

Optimization-based approaches in continual learning methodology focus on designing mechanisms from the perspective of manipulating optimization step. Typically, these approaches involve using different optimizers for various tasks. However, the evolution of optimization can be directly integrated into the CL algorithm so we don’t particularly design our own CL optimizers.

To choose an optimizer, assign the _target_ field to the class name of the optimizer. For example, to use Adam, set the _target_ field to torch.optim.Adam. Include _partial_: true as well (see below). Each optimizer has its own hyperparameters and configurations, which means it has its own required fields. The required fields are the same as the arguments of the class specified by _target_. The arguments for each optimizer class can be found in the PyTorch documentation.

PyTorch Documentation (Built-In Optimizers)

Warning

Make sure to include the field _partial_: true to enable partial instantiation. PyTorch optimizers need the model parameters as an argument to be fully instantiated, but during configuration we do not have that argument yet, so the optimizer can only be partially instantiated.