Save and Evaluate Model (MTL)

Modified

October 21, 2025

CLArena supports saving the model after training each task and evaluating it separately for the multi-task learning experiment.

1 Save Model

To save the model after training, enable the callback clarena.callbacks.SaveModels. Please refer to Configure Callbacks section.

Warning

Checkpointing is not used for saving models later for evaluation in CLArena. This is because the model class is needed to load checkpoints, while we expect evaluation model regardless of its type and setting. clarena.callbacks.SaveModels uses torch.save() so later evaluation can use torch.load() to load the model without specifying the model class.

2 Evaluate Model

Multi-task learning evaluation pipeline evaluates the saved model trained from multi-task learning experiment. Its output results are summarized in Output Results (MTL).

Running

To run a multi-task learning evaluation, specify the MTL_EVAL indicator in the command:

clarena pipeline=MTL_EVAL index=<index-config-name>

Configuration

To run a custom multi-task learning evaluation, create a YAML file in the index/ folder as index config. Below is an example.

Example

example_configs/index/example_mtl_eval.yaml

# @package _global_
# make sure to include the above commented global setting!

# pipeline info
pipeline: MTL_EVAL
eval_tasks: 5
global_seed: 1

# evaluation target
model_path: outputs/example_mtl_expr/2023-10-01_12-00-00/saved_models/mtl_model.pth

# components
defaults:
  - /mtl_dataset: from_cl_split_mnist.yaml
  - /trainer: cpu_eval.yaml
  - /metrics: mtl_default.yaml
  - /lightning_loggers: default.yaml
  - /callbacks: eval_default.yaml
  - /hydra: default.yaml
  - /misc: default.yaml

# outputs
output_dir: outputs/example_mtl_expr/2023-10-01_12-00-00/eval # output to the same folder as the experiment

Required Config Fields

Below is the list of required config fields for the index config of multi-task learning evaluation.

Field	Description	Allowed Values
`pipeline`	The default pipeline that `clarena` use the config to run	Choose from supported pipeline indicators Only `MTL_EVAL` is allowed
`eval_tasks`	The list of task IDs¹ to evaluate	List of integers $t_{k}$ : At least 1, no more than available number of tasks of MTL dataset Integer $T$ : Equivalent to list of integer [ $1, \dots, T$ ]. At least 0, no more than available number of tasks of MTL dataset
`global_seed`	The global seed for the entire evaluation	Same as `seed` argument in lightning.seed_everything()
`model_path`	The file path of the model to evaluate	Relative path to where you run the `clarena` command for the multi-task learning experiment
`/mtl_dataset`	The multi-task learning dataset that the model is evaluated on	Choose from sub-config YAML files in `mtl_dataset/` folder See Configure MTL Dataset
`/trainer`	The PyTorch Lightning Trainer object which contains all configs for testing process	Choose from sub-config YAML files in `trainer/` folder See Configure Trainer
`/metrics`	The metrics to be monitored, logged or visualized	Choose from sub-config YAML files in `metrics/` folder See Configure Metrics
`/callbacks`	The callbacks applied to this evaluation experiment (other than metric callbacks). Callbacks are additional actions integrated at different points during the evaluation	Choose from sub-config YAML files in `callbacks/` folder See Configure Callbacks
`/hydra`	Configuration for Hydra	Choose from sub-config YAML files in `hydra/` folder See Other Configs
`/misc`	Miscellaneous configs that are less related to the experiment	Choose from sub-config YAML files in `misc/` folder See Other Configs
`output_dir`	The folder storing the evaluation results	Relative path to where you run the `clarena` command We recommend setting it to the `output_dir` of continual learning main experiment to be evaluated

Note

The multi-task learning evaluation is managed by a MTLEvaluation class. To learn how these fields work, please refer to its source code.

Footnotes

The task IDs are integers starting from 1, ending with number of tasks of the MTL dataset. Each corresponds to a task-specific dataset in the MTL dataset.↩︎