Configure Metrics

Modified

November 25, 2025

Metrics are used to monitor the training and validation process and to evaluate the model and algorithm during testing. If you are not familiar with continual learning metrics, feel free to learn more from my article: A Summary of Continual Learning Metrics.

Under the PyTorch Lightning framework, callbacks add additional actions at different points in the experiment, including before, during, or after training, validation, or testing. The metrics in CLArena are implemented as metric callbacks, which can:

Calculate metrics and save their data to files.
Visualize metrics as plots from the saved data.
Log additional metrics during training process. (Note the majority of training metrics are handled by Lightning Loggers. See Configure Lightning Loggers)

The details of these actions are configured by the metric callbacks. Each group of metrics is organized as one metric callback. For example, CLAccuracy and CLLoss correspond to accuracy and loss metrics for continual learning. We can apply multiple metrics at the same time.

Metrics are a sub-config under the index config of:

Continual learning main experiment and evaluation
Continual learning full experiment and the reference experiments
Continual unlearning main experiment and evaluation
Continual unlearning full experiment, the reference experiments and evaluation
Multi-task learning experiment and evaluation
Single-task learning experiment and evaluation

To configure custom metrics, create a YAML file in the metrics/ folder. Below is an example of the metrics config.

Example

configs
├── __init__.py
├── entrance.yaml
├── index
│   ├── example_cl_main_expr.yaml
│   └── ...
├── metrics
│   ├── cl_main_expr_default.yaml
...

example_configs/index/example_cl_main_expr.yaml

defaults:
  ...
  - /metrics: cl_main_expr_default.yaml
  ...

The metrics config is a list of metric callback objects:

example_configs/metrics/cl_main_expr_default.yaml

- _target_: clarena.metrics.CLAccuracy
  save_dir: ${output_dir}/results/
  test_acc_csv_name: acc.csv
  test_acc_matrix_plot_name: acc_matrix.png
  test_ave_acc_plot_name: ave_acc.png
- _target_: clarena.metrics.CLLoss
  save_dir: ${output_dir}/results/
  test_loss_cls_csv_name: loss_cls.csv
  test_loss_cls_matrix_plot_name: loss_cls_matrix.png
  test_ave_loss_cls_plot_name: ave_loss_cls.png

Supported Metrics & Required Config Fields

In CLArena, we have implemented many metric callbacks as Python classes in the clarena.metrics module that you can use for your experiments and evaluations.

To choose a metric callback, assign the _target_ field to the corresponding class name, such as clarena.metrics.CLAccuracy for CLAccuracy. Each metric callback has its own hyperparameters and configurations, which means it has its own required fields. The required fields are the same as the arguments of the class specified by _target_. The arguments for each metric callback class can be found in the API documentation.

API Reference (Metrics) Source Code (Metrics)

Below is the full list of supported metric callbacks. These callbacks can only be applied to CL Main experiments. Note that the names in the “Metric Callback” column are the exact class names that you should assign to _target_.

Continual Learning Metrics

These metrics can be used in continual learning.

Metric Callback	Description	Required Config Fields
CLAccuracy	Provides all actions that are related to CL accuracy metric, which include: Defining, initializing and recording accuracy metric. Logging training and validation accuracy metric to Lightning loggers in real time. Saving test accuracy metric to files. Visualizing test accuracy metric as plots. The callback is able to produce the following outputs: CSV files for test accuracy (lower triangular) matrix and average accuracy. See here for details. Coloured plot for test accuracy (lower triangular) matrix. See here for details. Curve plots for test average accuracy over different training tasks. See here for details.	Same as CLAccuracy class arguments
CLLoss	Provides all actions that are related to CL loss metrics, which include: Defining, initializing and recording loss metrics. Logging training and validation loss metrics to Lightning loggers in real time. Saving test loss metrics to files. Visualizing test loss metrics as plots. The callback is able to produce the following outputs: CSV files for classification loss (lower triangular) matrix and average classification loss. See here for details. Coloured plot for test classification loss (lower triangular) matrix. See here for details. Curve plots for test average classification loss over different training tasks. See here for details.	Same as CLLoss class arguments

CLAccuracy

Provides all actions that are related to CL accuracy metric, which include:

Defining, initializing and recording accuracy metric.
Logging training and validation accuracy metric to Lightning loggers in real time.
Saving test accuracy metric to files.
Visualizing test accuracy metric as plots.

The callback is able to produce the following outputs:

CSV files for test accuracy (lower triangular) matrix and average accuracy. See here for details.
Coloured plot for test accuracy (lower triangular) matrix. See here for details.
Curve plots for test average accuracy over different training tasks. See here for details.

Same as CLAccuracy class arguments

CLLoss

Provides all actions that are related to CL loss metrics, which include:

Defining, initializing and recording loss metrics.
Logging training and validation loss metrics to Lightning loggers in real time.
Saving test loss metrics to files.
Visualizing test loss metrics as plots.

The callback is able to produce the following outputs:

CSV files for classification loss (lower triangular) matrix and average classification loss. See here for details.
Coloured plot for test classification loss (lower triangular) matrix. See here for details.
Curve plots for test average classification loss over different training tasks. See here for details.

Same as CLLoss class arguments

Each CL algorithm may have their own metrics and variables to log. We have implemented specialized metrics for different CL algorithms.

HAT

These metrics should be used with CL algorithm HAT and its extensions AdaHAT, FGAdaHAT. Please refer to Configure CL Algorithm section.

Metric Callback	Description	Required Config Fields
HATMasks	Provides all actions that are related to masks of HAT (Hard Attention to the Task) algorithm and its extensions, which include: Visualizing mask and cumulative mask figures during training and testing as figures. The callback is able to produce the following outputs: Figures of both training and test, masks and cumulative masks.	Same as HATMasks class arguments
HATAdjustmentRate	Provides all actions that are related to adjustment rate of HAT (Hard Attention to the Task) algorithm and its extensions, which include: Visualizing adjustment rate during training as figures. The callback is able to produce the following outputs: Figures of training adjustment rate.	Same as HATAdjustmentRate class arguments
HATNetworkCapacity	Provides all actions that are related to network capacity of HAT (Hard Attention to the Task) algorithm and its extensions, which include: Logging network capacity during training. See the “Evaluation Metrics” section in chapter 4.1 in the AdaHAT paper for more details about network capacity.	Same as HATNetworkCapacity class arguments

HATMasks

Provides all actions that are related to masks of HAT (Hard Attention to the Task) algorithm and its extensions, which include:

Visualizing mask and cumulative mask figures during training and testing as figures.

The callback is able to produce the following outputs:

Figures of both training and test, masks and cumulative masks.

Same as HATMasks class arguments

HATAdjustmentRate

Provides all actions that are related to adjustment rate of HAT (Hard Attention to the Task) algorithm and its extensions, which include:

Visualizing adjustment rate during training as figures.

The callback is able to produce the following outputs:

Figures of training adjustment rate.

Same as HATAdjustmentRate class arguments

HATNetworkCapacity

Provides all actions that are related to network capacity of HAT (Hard Attention to the Task) algorithm and its extensions, which include:

Logging network capacity during training. See the “Evaluation Metrics” section in chapter 4.1 in the AdaHAT paper for more details about network capacity.

Same as HATNetworkCapacity class arguments

Continual Unlearning Metrics

Continual unlearning is an experiment on top of continual learning with unlearning capabilities; therefore, it shares the same metrics with continual learning to measure regular CL performance. The following metrics are to measure unlearning performance and must be used in continual unlearning full experiment or full evaluation.

Metric Callback	Description	Required Config Fields
CULDistributionDistance	Provides all actions that are related to CUL distribution distance (DD) metric, which include: Defining, initializing and recording DD metric. Saving DD metric to files. Visualizing DD metric as plots. The callback is able to produce the following outputs: CSV files for DD in each task. Coloured plot for DD in each task.	Same as CULDistributionDistance class argument
CULAccuracyDifference	Provides all actions that are related to CUL accuracy difference (AD) metric, which include: Defining, initializing and recording AD metric. Saving AD metric to files. Visualizing AD metric as plots. The callback is able to produce the following outputs: CSV files for AD in each task. Coloured plot for AD in each task.	Same as CULAccuracyDifference class arguments

CULDistributionDistance

Provides all actions that are related to CUL distribution distance (DD) metric, which include:

Defining, initializing and recording DD metric.
Saving DD metric to files.
Visualizing DD metric as plots.

The callback is able to produce the following outputs:

CSV files for DD in each task.
Coloured plot for DD in each task.

Same as CULDistributionDistance class argument

CULAccuracyDifference

Provides all actions that are related to CUL accuracy difference (AD) metric, which include:

Defining, initializing and recording AD metric.
Saving AD metric to files.
Visualizing AD metric as plots.

The callback is able to produce the following outputs:

CSV files for AD in each task.
Coloured plot for AD in each task.

Same as CULAccuracyDifference class arguments

Multi-Task Learning Metrics

These metrics can be used in multi-task learning.

Callback	Description	Required Config Fields
MTLAccuracy	Provides all actions that are related to MTL accuracy metric, which include: Defining, initializing and recording accuracy metric. Logging training and validation accuracy metric to Lightning loggers in real time. Saving test accuracy metric to files. Visualizing test accuracy metric as plots. The callback is able to produce the following outputs: CSV files for test accuracy of all tasks and average accuracy. Bar charts for test accuracy of all tasks.	Same as MTLAccuracy class arguments
MTLLoss	Provides all actions that are related to MTL loss metrics, which include: Defining, initializing and recording loss metrics. Logging training and validation loss metrics to Lightning loggers in real time. Saving test loss metrics to files. Visualizing test loss metrics as plots. The callback is able to produce the following outputs: CSV files for test classification loss of all tasks and average classification loss. Bar charts for test classification loss of all tasks.	Same as MTLLoss class arguments

MTLAccuracy

Provides all actions that are related to MTL accuracy metric, which include:

Defining, initializing and recording accuracy metric.
Logging training and validation accuracy metric to Lightning loggers in real time.
Saving test accuracy metric to files.
Visualizing test accuracy metric as plots.

The callback is able to produce the following outputs:

CSV files for test accuracy of all tasks and average accuracy.
Bar charts for test accuracy of all tasks.

Same as MTLAccuracy class arguments

MTLLoss

Provides all actions that are related to MTL loss metrics, which include:

Defining, initializing and recording loss metrics.
Logging training and validation loss metrics to Lightning loggers in real time.
Saving test loss metrics to files.
Visualizing test loss metrics as plots.

The callback is able to produce the following outputs:

CSV files for test classification loss of all tasks and average classification loss.
Bar charts for test classification loss of all tasks.

Same as MTLLoss class arguments

Single-Task Learning Metrics

These metrics can be used in single-task learning.

Callback	Description	Required Config Fields
STLAccuracy	Provides all actions that are related to STL accuracy metric, which include: Defining, initializing and recording accuracy metric. Logging training and validation accuracy metric to Lightning loggers in real time. Saving test accuracy metric to files. The callback is able to produce the following outputs: CSV files for test accuracy.	Same as STLAccuracy class arguments
STLLoss	Provides all actions that are related to STL loss metrics, which include: Defining, initializing and recording loss metrics. Logging training and validation loss metrics to Lightning loggers in real time. Saving test loss metrics to files. The callback is able to produce the following outputs: CSV files for test classification loss.	Same as STLLoss class arguments

STLAccuracy

Provides all actions that are related to STL accuracy metric, which include:

Defining, initializing and recording accuracy metric.
Logging training and validation accuracy metric to Lightning loggers in real time.
Saving test accuracy metric to files.

The callback is able to produce the following outputs:

CSV files for test accuracy.

Same as STLAccuracy class arguments

STLLoss

Provides all actions that are related to STL loss metrics, which include:

Defining, initializing and recording loss metrics.
Logging training and validation loss metrics to Lightning loggers in real time.
Saving test loss metrics to files.

The callback is able to produce the following outputs:

CSV files for test classification loss.

Same as STLLoss class arguments