Shawn’s Blog
  • 🗂️ Collections
    • 🖥️ Slides Gallery
    • 🧑‍🍳️ Cooking Ideas
    • 🍱 Cookbook
    • 💬 Language Learning
    • 🎼 Songbook
  • ⚙️ Projects
    • ⚛ Continual Learning Arena
  • 📄 Papers
    • AdaHAT
    • FG-AdaHAT
  • 🎓 CV
    • CV (English)
    • CV (Mandarin)
  • About
  1. Custom Implementation
  2. MTL Dataset
  • Welcome to CLArena
  • Getting Started
  • Configure Pipelines
  • Continual Learning (CL)
    • CL Main Experiment
    • Save and Evaluate Model
    • Full Experiment
    • Output Results
  • Continual Unlearning (CUL)
    • CUL Main Experiment
    • Full Experiment
    • Output Results
  • Multi-Task Learning (MTL)
    • MTL Experiment
    • Save and Evaluate Model
    • Output Results
  • Single-Task Learning (STL)
    • STL Experiment
    • Save and Evaluate Model
    • Output Results
  • Components
    • CL Dataset
    • MTL Dataset
    • STL Dataset
    • CL Algorithm
    • CUL Algorithm
    • MTL Algorithm
    • STL Algorithm
    • Backbone Network
    • Optimizer
    • Learning Rate Scheduler
    • Trainer
    • Metrics
    • Lightning Loggers
    • Callbacks
    • Other Configs
  • Custom Implementation
    • CL Dataset
    • MTL Dataset
    • STL Dataset
    • CL Algorithm
    • CUL Algorithm
    • MTL Algorithm
    • STL Algorithm
    • Backbone Network
    • Callback
  • API Reference

On this page

  • Base Classes
  • Implement Combined MTL Dataset
  1. Custom Implementation
  2. MTL Dataset

Implement Custom MTL Dataset

Modified

September 5, 2025

This section guides you through implementing custom multi-task learning datasets for use in CLArena.

Multi-task learning can be constructed from single-task datasets by combining different datasets, each as a separate task.

Base Classes

In CLArena, multi-task learning datasets are implemented as subclasses of the base classes defined in clarena/mtl_datasets/base.py. The base classes are implemented inheriting Lightning data module with additional features for multi-task learning:

  • clarena.mtl_datasets.MTLDataset: The base class for all multi-task learning datasets.
    • clarena.mtl_datasets.MTLCombinedDataset: The base class for combined multi-task learning datasets. A child class of MTLDataset.
    • clarena.mtl_datasets.MTLDatasetFromCL: The base class for constructing multi-task learning datasets from continual learning datasets. A child class of MTLDataset.

Implement Combined MTL Dataset

Combined MTL dataset is already implemented as Combined in clarena/mtl_datasets/combined.py. To add more available single-task datasets to construct combined MTL dataset, please add them in AVAILABLE_DATASETS, prepare_data(), train_and_val_dataset(), and test_dataset() methods.

Warning

The MTL Dataset must be task labelled, which means each sample of a batch not only has input data and target label – (x,y), but also a task label t indicating which task the sample belongs to – (x,y,t). The task label is used in multi-task learning models to select the appropriate output head for each task.

To turn a single-task dataset into a task-labelled dataset, you can use the TaskLabelledDataset wrapper in clarena/stl_datasets/base.py.


For more details, please refer to the API Reference and source code. You may take implemented MTL datasets in CLArena as examples. Feel free to contribute by submitting pull requests in GitHub!

API Reference (MTL Datasets) Source Code (MTL Datasets) GitHub Pull Request

Back to top
CL Dataset
STL Dataset
 
 

©️ 2025 Pengxiang Wang. All rights reserved.