Configure Optimizer

We use PyTorch Optimizer objects to train models within the framework of PyTorch and Lightning.

As continual learning involves multiple tasks, each task is supposed to be given a optimizer for training. We can either use a uniform optimizer across all tasks or assign distinct optimizer to each task.

Configure Uniform Optimizer For All Tasks

To configure uniform optimizer for all tasks for your experiment, link the /optimizer field in the experiment index config to a YAML file in optimizer/ subfolder of your configs. That YAML file should use _target_ field to link to a PyTorch optimizer class and specify its arguments in the following field. Here is an example:

./clarena/example_configs
├── __init__.py
├── entrance.yaml
├── experiment
│   ├── example.yaml
│   └── ...
├── optimizer
│   ├── sgd_10_tasks.yaml
│   └── sgd.yaml
...

example_configs/experiment/example.yaml

defaults:
  ...
  - /optimizer: sgd.yaml
  ...

example_configs/optimizer/sgd.yaml

_target_: torch.optim.SGD
_partial_: True # partially instantiate optimizer without 'params' argument. Make sure this is included in any case!
lr: 0.01
weight_decay: 0.0

Warning

Make sure to include field _partial_: True to enable partial instantiation. PyTorch optimizer need the model parameters as an argument to be fully instantiated, but we are now in the phase of configuration and certainly don’t have that argument, so the optimizer can be only partially instantiated.

Configure Distinct Optimizer For Each Task

To configure distinct optimizer for each task for your experiment, the YAML file linked in optimizer/ subfolder should be a list of PyTorch optimizer classes. Each class is assigned to a task. The length of the list must be equal to field num_tasks in experiment index config.

example_configs/optimizer/sgd_10_tasks.yaml

- _target_: torch.optim.SGD
  _partial_: True # partially instantiate optimizer without 'params' argument. Make sure this is included in any case!
  lr: 0.01
  weight_decay: 0.0
- _target_: torch.optim.SGD
  _partial_: True # partially instantiate optimizer without 'params' argument. Make sure this is included in any case!
  lr: 0.01
  weight_decay: 0.0
- _target_: torch.optim.SGD
  _partial_: True # partially instantiate optimizer without 'params' argument. Make sure this is included in any case!
  lr: 0.01
  weight_decay: 0.0
- _target_: torch.optim.SGD
  _partial_: True # partially instantiate optimizer without 'params' argument. Make sure this is included in any case!
  lr: 0.01
  weight_decay: 0.0
- _target_: torch.optim.SGD
  _partial_: True # partially instantiate optimizer without 'params' argument. Make sure this is included in any case!
  lr: 0.01
  weight_decay: 0.0
- _target_: torch.optim.SGD
  _partial_: True # partially instantiate optimizer without 'params' argument. Make sure this is included in any case!
  lr: 0.01
  weight_decay: 0.0
- _target_: torch.optim.SGD
  _partial_: True # partially instantiate optimizer without 'params' argument. Make sure this is included in any case!
  lr: 0.01
  weight_decay: 0.0
- _target_: torch.optim.SGD
  _partial_: True # partially instantiate optimizer without 'params' argument. Make sure this is included in any case!
  lr: 0.01
  weight_decay: 0.0
- _target_: torch.optim.SGD
  _partial_: True # partially instantiate optimizer without 'params' argument. Make sure this is included in any case!
  lr: 0.01
  weight_decay: 0.0
- _target_: torch.optim.SGD
  _partial_: True # partially instantiate optimizer without 'params' argument. Make sure this is included in any case!
  lr: 0.01
  weight_decay: 0.0

Supported Optimizers

We fully support all built-in optimizers defined in PyTorch. Please refer to PyTorch documentation to see the full list. Please also refer to PyTorch documentation of each optimizer class to learn its required arguments.

PyTorch Documentation (Built-In Optimizers)

Note

Optimisation-based approaches in continual learning methodology focus on designing mechanisms from the perspective of manipulating optimisation step. Typically, these approaches involve using different optimizers for various tasks. However, the evolution of optimisation can be directly integrated into the CL algorithm so we don’t particularly design our own CL optimizers.