Shawn’s Blog
  • πŸ—‚οΈ Collections
    • πŸ–₯️ Slides Gallery
    • πŸ§‘β€πŸ³οΈ Cooking Ideas
    • 🍱 Cookbook
    • πŸ’¬ Language Learning
    • 🎼 Songbook
  • βš™οΈ Projects
    • βš› Continual Learning Arena
  • πŸ“„ Papers
    • AdaHAT
    • FG-AdaHAT
  • πŸŽ“ CV
    • CV (English)
    • CV (Mandarin)
  • About
  1. Components
  2. Metrics
  • Welcome to CLArena
  • Getting Started
  • Configure Pipelines
  • Continual Learning (CL)
    • CL Main Experiment
    • Save and Evaluate Model
    • Full Experiment
    • Output Results
  • Continual Unlearning (CUL)
    • CUL Main Experiment
    • Full Experiment
    • Output Results
  • Multi-Task Learning (MTL)
    • MTL Experiment
    • Save and Evaluate Model
    • Output Results
  • Single-Task Learning (STL)
    • STL Experiment
    • Save and Evaluate Model
    • Output Results
  • Components
    • CL Dataset
    • MTL Dataset
    • STL Dataset
    • CL Algorithm
    • CUL Algorithm
    • MTL Algorithm
    • STL Algorithm
    • Backbone Network
    • Optimizer
    • Learning Rate Scheduler
    • Trainer
    • Metrics
    • Lightning Loggers
    • Callbacks
    • Other Configs
  • Implement Your Modules (TBC)
  • API Reference

On this page

  • Example
  • Supported Metrics & Required Config Fields
    • General
    • HAT
  • Example
  • Supported Metrics & Required Config Fields
    • CL Metrics
    • Unlearning Metrics
  • Example
  • Supported Callbacks & Required Config Fields
  • Example
  • Supported Metrics & Required Config Fields
    • General
  • Example
  • Supported Metrics & Required Config Fields
    • General
  1. Components
  2. Metrics

Configure Metrics (STL)

Modified

August 26, 2025

Metrics are used to monitor the training and validation process and to evaluate the model and algorithm during testing. If you are not familiar with continual learning metrics, feel free to learn more from my article: A Summary of Continual Learning Metrics.

Under the PyTorch Lightning framework, callbacks add additional actions at different points in the experiment, including before, during, or after training, validation, or testing. The metrics in CLArena are implemented as metric callbacks, which can:

  • Calculate metrics and save their data to files.
  • Visualize metrics as plots from the saved data.
  • Log additional metrics during training process. (Note the majority of training metrics are handled by Lightning Loggers. See Configure Lightning Loggers (CL Main) section)

The details of these actions are configured by the metric callbacks. Each group of metrics is organized as one metric callback. For example, CLAccuracy and CLLoss correspond to accuracy and loss metrics for continual learning. We can apply multiple metrics at the same time.

Metrics are a sub-config under the experiment index config (CL Main). To configure custom metrics, create a YAML file in the metrics/ folder. At the moment, we only support a uniform metrics setting across all tasks. Below is an example of the metrics config.

Example

configs
β”œβ”€β”€ __init__.py
β”œβ”€β”€ entrance.yaml
β”œβ”€β”€ experiment
β”‚   β”œβ”€β”€ example_clmain_train.yaml
β”‚   └── ...
β”œβ”€β”€ metrics
β”‚   β”œβ”€β”€ cl_default.yaml
...
configs/experiment/example_clmain_train.yaml
defaults:
  ...
  - /metrics: cl_default.yaml
  ...

The metrics config is a list of metric callback objects:

configs/metrics/cl_default.yaml
- _target_: clarena.metrics.CLAccuracy
  save_dir: ${output_dir}/results/
  test_acc_csv_name: acc.csv
  test_acc_matrix_plot_name: acc_matrix.png
  test_ave_acc_plot_name: ave_acc.png
- _target_: clarena.metrics.CLLoss
  save_dir: ${output_dir}/results/
  test_loss_cls_csv_name: loss_cls.csv
  test_loss_cls_matrix_plot_name: loss_cls_matrix.png
  test_ave_loss_cls_plot_name: ave_loss_cls.png

Supported Metrics & Required Config Fields

In CLArena, we have implemented many metric callbacks as Python classes in the clarena.metrics module that you can use for your experiments.

To choose a metric callback, assign the _target_ field to the corresponding class name, such as clarena.metrics.CLAccuracy for CLAccuracy. Each metric callback has its own hyperparameters and configurations, which means it has its own required fields. The required fields are the same as the arguments of the class specified by _target_. The arguments for each metric callback class can be found in the API documentation.

API Reference (Metrics) Source Code (Metrics)

Below is the full list of supported metric callbacks. These callbacks can only be applied to CL Main experiments. Note that the names in the β€œMetric Callback” column are the exact class names that you should assign to _target_.

General

These metrics can be generally used unless noted otherwise.

Metric Callback Description Required Config Fields
CLAccuracy

Provides all actions that are related to CL accuracy metric, which include:

  • Defining, initializing and recording accuracy metric.
  • Logging training and validation accuracy metric to Lightning loggers in real time.
  • Saving test accuracy metric to files.
  • Visualizing test accuracy metric as plots.

The callback is able to produce the following outputs:

  • CSV files for test accuracy (lower triangular) matrix and average accuracy. See here for details.
  • Coloured plot for test accuracy (lower triangular) matrix. See here for details.
  • Curve plots for test average accuracy over different training tasks. See here for details.
Same as CLAccuracy class arguments
CLLoss

Provides all actions that are related to CL loss metrics, which include:

  • Defining, initializing and recording loss metrics.
  • Logging training and validation loss metrics to Lightning loggers in real time.
  • Saving test loss metrics to files.
  • Visualizing test loss metrics as plots.

The callback is able to produce the following outputs:

  • CSV files for classification loss (lower triangular) matrix and average classification loss. See here for details.
  • Coloured plot for test classification loss (lower triangular) matrix. See here for details.
  • Curve plots for test average classification loss over different training tasks. See here for details.
Same as CLLoss class arguments

Each CL algorithm may have their own metrics and variables to log. We have implemented specialized metrics for different CL algorithms.

HAT

These metrics should be used with CL algorithm HAT and its extensions AdaHAT, FGAdaHAT. Please refer to Configure CL Algorithm (CL Main) section.

Metric Callback Description Required Config Fields
HATMasks

Provides all actions that are related to masks of HAT (Hard Attention to the Task) algorithm and its extensions, which include:

  • Visualizing mask and cumulative mask figures during training and testing as figures.

The callback is able to produce the following outputs:

  • Figures of both training and test, masks and cumulative masks.
Same as HATMasks class arguments
HATAdjustmentRate

Provides all actions that are related to adjustment rate of HAT (Hard Attention to the Task) algorithm and its extensions, which include:

  • Visualizing adjustment rate during training as figures.

The callback is able to produce the following outputs:

  • Figures of training adjustment rate.
Same as HATAdjustmentRate class arguments
HATNetworkCapacity

Provides all actions that are related to network capacity of HAT (Hard Attention to the Task) algorithm and its extensions, which include:

  • Logging network capacity during training. See the β€œEvaluation Metrics” section in chapter 4.1 in AdaHAT paper for more details about network capacity.
Same as HATNetworkCapacity class arguments

Metrics are used to monitor training and validation process, and evaluate the model and algorithm during testing process.

Under the framework of PyTorch Lightning, callbacks are used to add additional actions and functionalities integrated in different timing of the experiment, which includes before, during, or after training, validating, or testing process. The metrics in our packages are implemented as metric callbacks, which can do:

  • Calculate metrics and save their data to files.
  • Visualize metrics as plots from the saved data.

The details of the actions can be configured by the metric callbacks. Each group of metrics is organized as one metric callback, for example, CULDistributionDistance and CULAccuracyDifference correspond to DD and AD metrics of continual unlearning. We can apply multiple metrics at the same time.

Continual unlearning is an experiment on top of continual learning with unlearning capabilities; therefore, it shares the same metrics with CL to measure regular CL performance, please refer to Configure Metrics (CL Main) section. The metrics to measure unlearning performance must be used in CUL full evaluation experiment, please refer to CUL Full Evaluation.

Metrics is a sub-config under the experiment index config (CUL Main), as well as experiment index config (CUL full evaluation). To configure custom metrics, you need to create a YAML file in metrics/ folder. At the moment, we only support uniform metrics across all tasks. Below shows examples of the metrics config.

Example

configs
β”œβ”€β”€ __init__.py
β”œβ”€β”€ entrance.yaml
β”œβ”€β”€ experiment
β”‚   β”œβ”€β”€ example_culmain_train.yaml
β”‚   └── ...
β”œβ”€β”€ metrics
β”‚   β”œβ”€β”€ cl_default.yaml
...
configs/experiment/example_culmain_train.yaml
defaults:
  ...
  - /metrics: cl_default.yaml
  ...

The metrics config is a list of metric callback objects:

configs/metrics/cl_default.yaml
- _target_: clarena.metrics.CLAccuracy
  save_dir: ${output_dir}/results/
  test_acc_csv_name: acc.csv
  test_acc_matrix_plot_name: acc_matrix.png
  test_ave_acc_plot_name: ave_acc.png
- _target_: clarena.metrics.CLLoss
  save_dir: ${output_dir}/results/
  test_loss_cls_csv_name: loss_cls.csv
  test_loss_cls_matrix_plot_name: loss_cls_matrix.png
  test_ave_loss_cls_plot_name: ave_loss_cls.png

Supported Metrics & Required Config Fields

In CLArena, we implemented many metric callbacks in clarena.metrics module that you can use for CUL main and full evaluation experiment.

The _target_ field of each callback must be assigned to the corresponding class name, such as clarena.metrics.CLAccuracy for CLAccuracy. Each metric callback has its own required fields, which are the same as the arguments of the class specified by _target_. The arguments of each metric callback class can be found in API documentation.

API Reference (Metrics) Source Code (Metrics)

Below is the full list of supported metric callbacks. Note that the β€œMetric Callback” is exactly the class name that the _target_ field is assigned.

CL Metrics

These metric callbacks can be applied to CUL main experiment, because they can be applied to CL main experiment.

Please refer to Supported Metrics & Required Config Fields in Configure Metrics (CL Main) for more details.

Unlearning Metrics

These metric callbacks can be applied to CUL full evaluation experiment only.

Metric Callback Description Required Config Fields
CULDistributionDistance

Provides all actions that are related to CUL distribution distance (DD) metric, which include:

  • Defining, initializing and recording DD metric.
  • Saving DD metric to files.
  • Visualizing DD metric as plots.

The callback is able to produce the following outputs:

  • CSV files for DD in each task.
  • Coloured plot for DD in each task.
Same as CULDistributionDistance class arguments
CULAccuracyDifference

Provides all actions that are related to CUL accuracy difference (AD) metric, which include:

  • Defining, initializing and recording AD metric.
  • Saving AD metric to files.
  • Visualizing AD metric as plots.

The callback is able to produce the following outputs:

  • CSV files for AD in each task.
  • Coloured plot for AD in each task.
Same as CULAccuracyDifference class arguments

Under the framework of PyTorch Lightning, we use callbacks to add additional actions and functionalities integrated in different timing of the experiment. This includes before, during, or after training, validating, or testing process.

Each type of callback is designed for one set of actions, such as save sample images, generating logging information, etc. we can apply multiple callbacks to enables different sets of actions at the same time.

This section is only about configuring callbacks other than metric callbacks. To configure metric callbacks, please refer to Configure Metrics (CL Main) section.

Callbacks is a sub-config under the experiment index config (CUL Main) and experiment index config (CUL full evaluation). To configure custom callbacks, you need to create a YAML file in callbacks/ folder. At the moment, we only support uniform callbacks setting across all tasks. Below shows examples of the callbacks config.

Example

configs
β”œβ”€β”€ __init__.py
β”œβ”€β”€ entrance.yaml
β”œβ”€β”€ experiment
β”‚   β”œβ”€β”€ example_clmain_train.yaml
β”‚   └── ...
β”œβ”€β”€ callback
β”‚   β”œβ”€β”€ cul_default.yaml
...
configs/experiment/example_culmain_train.yaml
defaults:
  ...
  - /callbacks: cul_default.yaml
  ...

The callbacks config is a list of callback objects:

configs/callbacks/cul_default.yaml
- _target_: clarena.callbacks.CULPylogger
- _target_: clarena.callbacks.CLRichProgressBar
- _target_: clarena.callbacks.SaveFirstBatchImages
  save_dir: ${output_dir}/samples/
- _target_: clarena.callbacks.SaveModels
  save_dir: ${output_dir}/saved_models/

Supported Callbacks & Required Config Fields

All Lightning built-in callbacks are supported. In CLArena, we also implemented many callbacks in clarena.callbacks module that you can use for your CUL Main experiment.

The _target_ field of each callback must be assigned to the corresponding class name, such as lightning.pytorch.callbacks.EarlyStopping for EarlyStopping. Each callback has its own required fields, which are the same as the arguments of the class specified by _target_. The arguments of each callback class can be found in PyTorch Lightning documentation and CLArena API documentation.

PyTorch Lightning Documentation (Built-In Callbacks) API Reference (Callbacks) Source Code (Callbacks)

Below is the full list of supported callbacks that can be applied to CUL Main experiment. Note that the β€œCallback” is exactly the class name that the _target_ field is assigned.

Callback Description Required Config Fields
SaveModels Saves the model at the end of each training task. Please refer to Save and Evaluate Model (CLMain) section. Same as SaveModels class arguments
CULPylogger Provides additional logging messages for during continual unlearning progress. For example, at the start and end of each training and testing task, log the task ID and other relevant information. Same as CULPylogger class arguments
SaveFirstBatchImages Saves images and labels of the first batch of training data into files. Applies to all tasks. Same as SaveFirstBatchImages class arguments
CLRichProgressBar Customised RichProgressBar for continual learning. Same as CLRichProgressBar class arguments

The callbacks that can be applied to CUL full evaluation experiment include CULPylogger and CLRichProgressBar.

Metrics are used to monitor training and validation process, and evaluate the model and algorithm during testing process.

Under the framework of PyTorch Lightning, callbacks are used to add additional actions and functionalities integrated in different timing of the experiment, which includes before, during, or after training, validating, or testing process. The metrics in our packages are implemented as metric callbacks, which can do:

  • Calculate metrics and save their data to files.
  • Visualize metrics as plots from the saved data.
  • Log additional metrics during training process. (Note the majority of training metrics are handled by Lightning Loggers. See Configure Lightning Loggers section)

The details of the actions can be configured by the metric callbacks. Each group of metrics is organized as one metric callback, for example, MTLAccuracy and MTLLoss correspond to accuracy and loss metrics of multi-task learning. We can apply multiple metrics at the same time.

Metrics is a sub-config under the experiment index config (MTL). To configure custom metrics, you need to create a YAML file in metrics/ folder. At the moment, we only support uniform metrics across all tasks. Below shows examples of the metrics config.

Example

configs
β”œβ”€β”€ __init__.py
β”œβ”€β”€ entrance.yaml
β”œβ”€β”€ experiment
β”‚   β”œβ”€β”€ example_mtl_train.yaml
β”‚   └── ...
β”œβ”€β”€ metrics
β”‚   β”œβ”€β”€ mtl_default.yaml
...
configs/experiment/example_mtl_train.yaml
defaults:
  ...
  - /metrics: mtl_default.yaml
  ...

The metrics config is a list of metric callback objects:

configs/metrics/mtl_default.yaml
- _target_: clarena.metrics.MTLAccuracy
  save_dir: ${output_dir}/results/
  test_acc_csv_name: acc.csv
  test_ave_acc_plot_name: ave_acc.png
- _target_: clarena.metrics.MTLLoss
  save_dir: ${output_dir}/results/
  test_loss_cls_csv_name: loss_cls.csv
  test_ave_loss_cls_plot_name: ave_loss_cls.png

Supported Metrics & Required Config Fields

In CLArena, we implemented many metric callbacks in clarena.metrics module that you can use for MTL experiment.

The _target_ field of each callback must be assigned to the corresponding class name, such as clarena.metrics.MTLAccuracy for MTLAccuracy. Each metric callback has its own required fields, which are the same as the arguments of the class specified by _target_. The arguments of each metric callback class can be found in API documentation.

API Reference (Metrics) Source Code (Metrics)

Below is the full list of supported metric callbacks. These callbacks can only be applied to MTL experiment. Note that the β€œMetric Callback” is exactly the class name that the _target_ field is assigned.

General

These metrics can be generally used unless noted otherwise.

Callback Description Required Config Fields
MTLAccuracy

Provides all actions that are related to MTL accuracy metric, which include:

  • Defining, initializing and recording accuracy metric.
  • Logging training and validation accuracy metric to Lightning loggers in real time.
  • Saving test accuracy metric to files.
  • Visualizing test accuracy metric as plots.

The callback is able to produce the following outputs:

  • CSV files for test accuracy of all tasks and average accuracy.
  • Bar charts for test accuracy of all tasks.
Same as MTLAccuracy class arguments
MTLLoss

Provides all actions that are related to MTL loss metrics, which include:

  • Defining, initializing and recording loss metrics.
  • Logging training and validation loss metrics to Lightning loggers in real time.
  • Saving test loss metrics to files.
  • Visualizing test loss metrics as plots.

The callback is able to produce the following outputs:

  • CSV files for test classification loss of all tasks and average classification loss.
  • Bar charts for test classification loss of all tasks.
Same as MTLLoss class arguments

Metrics are used to monitor training and validation process, and evaluate the model and algorithm during testing process.

Under the framework of PyTorch Lightning, callbacks are used to add additional actions and functionalities integrated in different timing of the experiment, which includes before, during, or after training, validating, or testing process. The metrics in our packages are implemented as metric callbacks, which can do:

  • Calculate metrics and save their data to files.
  • Visualize metrics as plots from the saved data.
  • Log additional metrics during training process. (Note the majority of training metrics are handled by Lightning Loggers. See Configure Lightning Loggers section)

The details of the actions can be configured by the metric callbacks. Each group of metrics is organized as one metric callback, for example, STLAccuracy and STLLoss correspond to accuracy and loss metrics of single-task learning. We can apply multiple metrics at the same time.

Metrics is a sub-config under the experiment index config (STL). To configure custom metrics, you need to create a YAML file in metrics/ folder. At the moment, we only support uniform metrics across all tasks. Below shows examples of the metrics config.

Example

configs
β”œβ”€β”€ __init__.py
β”œβ”€β”€ entrance.yaml
β”œβ”€β”€ experiment
β”‚   β”œβ”€β”€ example_stl_train.yaml
β”‚   └── ...
β”œβ”€β”€ metrics
β”‚   β”œβ”€β”€ stl_default.yaml
...
configs/experiment/example_stl_train.yaml
defaults:
  ...
  - /metrics: stl_default.yaml
  ...

The metrics config is a list of metric callback objects:

configs/metrics/mtl_default.yaml
- _target_: clarena.metrics.STLAccuracy
  save_dir: ${output_dir}/results/
  test_acc_csv_name: acc.csv
- _target_: clarena.metrics.STLLoss
  save_dir: ${output_dir}/results/
  test_loss_cls_csv_name: loss_cls.csv

Supported Metrics & Required Config Fields

In CLArena, we implemented many metric callbacks in clarena.metrics module that you can use for STL experiment.

The _target_ field of each callback must be assigned to the corresponding class name, such as clarena.metrics.STLAccuracy for STLAccuracy. Each metric callback has its own required fields, which are the same as the arguments of the class specified by _target_. The arguments of each metric callback class can be found in API documentation.

API Reference (Metrics) Source Code (Metrics)

Below is the full list of supported metric callbacks. These callbacks can only be applied to STL experiment. Note that the β€œMetric Callback” is exactly the class name that the _target_ field is assigned.

General

These metrics can be generally used unless noted otherwise.

Callback Description Required Config Fields
STLAccuracy

Provides all actions that are related to STL accuracy metric, which include:

  • Defining, initializing and recording accuracy metric.
  • Logging training and validation accuracy metric to Lightning loggers in real time.
  • Saving test accuracy metric to files.

The callback is able to produce the following outputs:

  • CSV files for test accuracy.
Same as STLAccuracy class arguments
STLLoss

Provides all actions that are related to STL loss metrics, which include:

  • Defining, initializing and recording loss metrics.
  • Logging training and validation loss metrics to Lightning loggers in real time.
  • Saving test loss metrics to files.

The callback is able to produce the following outputs:

  • CSV files for test classification loss.
Same as STLLoss class arguments
Back to top
Trainer
Lightning Loggers
 
 

©️ 2025 Pengxiang Wang. All rights reserved.