Implement Custom STL Dataset

Modified

September 6, 2025

This section guides you through implementing custom single-task learning datasets for use in CLArena.

Single-task learning can be constructed from single-task datasets by combining different datasets, each as a separate task.

Base Classes

In CLArena, single-task learning datasets are implemented as subclasses of the base classes defined in clarena/stl_datasets/base.py. The base classes are implemented inheriting Lightning data module with additional features for single-task learning:

clarena.stl_datasets.STLDataset: The base class for all single-task learning datasets.
- clarena.stl_datasets.STLDatasetFromRaw: The base class for constructing single-task learning datasets from raw datasets. A child class of STLDataset.

Implement STL Dataset From Raw Datasets

To implement STL datasets from raw datasets:

Inherit STLDatasetFromRaw.
Define class property original_dataset_python_class, which is the raw python class of the original dataset that the STL dataset is constructed from. If there’s no such class, implement one under clarena/stl_datasets/raw/ (preferably a PyTorch Dataset).
Define the constants of the original dataset in a subclass of DatasetConstants in clarena/stl_datasets/raw/constants.py. Link the constants class to the original_dataset_python_class in the DATASET_CONSTANTS_MAPPING dictionary.
Write prepare_data(), train_and_val_dataset(), test_dataset(). You may call the APIs provided by the original_dataset_python_class. Make sure to use self.train_and_val_transforms(), self.test_transforms() and self.target_transform() to assign transforms.

For more details, please refer to the API Reference and source code. You may take implemented STL datasets in CLArena as examples. Feel free to contribute by submitting pull requests in GitHub!

API Reference (STL Datasets) Source Code (STL Datasets) GitHub Pull Request