Implement Custom STL Dataset
This section guides you through implementing custom single-task learning datasets for use in CLArena.
Single-task learning can be constructed from single-task datasets by combining different datasets, each as a separate task.
Base Classes
In CLArena, single-task learning datasets are implemented as subclasses of the base classes defined in clarena/stl_datasets/base.py. The base classes are implemented inheriting Lightning data module with additional features for single-task learning:
clarena.stl_datasets.STLDataset
: The base class for all single-task learning datasets.clarena.stl_datasets.STLDatasetFromRaw
: The base class for constructing single-task learning datasets from raw datasets. A child class ofSTLDataset
.
Implement STL Dataset From Raw Datasets
To implement STL datasets from raw datasets:
- Inherit
STLDatasetFromRaw
. - Define class property
original_dataset_python_class
, which is the raw python class of the original dataset that the STL dataset is constructed from. If there’s no such class, implement one under clarena/stl_datasets/raw/ (preferably a PyTorchDataset
). - Define the constants of the original dataset in a subclass of
DatasetConstants
in clarena/stl_datasets/raw/constants.py. Link the constants class to theoriginal_dataset_python_class
in theDATASET_CONSTANTS_MAPPING
dictionary. - Write
prepare_data()
,train_and_val_dataset()
,test_dataset()
. You may call the APIs provided by theoriginal_dataset_python_class
. Make sure to useself.train_and_val_transforms()
,self.test_transforms()
andself.target_transform()
to assign transforms.
For more details, please refer to the API Reference and source code. You may take implemented STL datasets in CLArena as examples. Feel free to contribute by submitting pull requests in GitHub!