clarena.backbones

Backbone Networks for Continual Learning

This submodule provides the backbone neural network architectures for continual learning.

Please note that this is an API documentation. Please refer to the main documentation pages for more information about the backbone networks and how to configure and implement them:

The backbones are implemented as subclasses of CLBackbone classes, which are the base class for all continual learning backbones in CLArena.

CLBackbone: The base class for continual learning backbones.
HATMaskBackbone: The base class for backbones used in HAT (Hard Attention to the Task) algorithm. A child class of CLBackbone.

View Source

 1r"""
 2
 3# Backbone Networks for Continual Learning
 4
 5This submodule provides the **backbone neural network architectures for continual learning**. 
 6
 7Please note that this is an API documentation. Please refer to the main documentation pages for more information about the backbone networks and how to 
 8configure and implement them:
 9
10- [**Configure Backbone Network**](https://pengxiang-wang.com/projects/continual-learning-arena/docs/configure-your-experiment/backbone-network)
11- [**Implement Your CL Backbone Class**](https://pengxiang-wang.com/projects/continual-learning-arena/docs/implement-your-CL-modules/backbone-network)
12
13
14
15The backbones are implemented as subclasses of `CLBackbone` classes, which are the base class for all continual learning backbones in CLArena.
16
17- `CLBackbone`: The base class for continual learning backbones.
18- `HATMaskBackbone`: The base class for backbones used in [HAT (Hard Attention to the Task) algorithm](http://proceedings.mlr.press/v80/serra18a). A child class of `CLBackbone`.
19
20
21"""
22
23from .base import CLBackbone, HATMaskBackbone
24from .mlp import MLP, HATMaskMLP
25from .resnet import (
26    HATMaskResNet18,
27    HATMaskResNet34,
28    HATMaskResNet50,
29    HATMaskResNet101,
30    HATMaskResNet152,
31    ResNet18,
32    ResNet34,
33    ResNet50,
34    ResNet101,
35    ResNet152,
36)
37
38__all__ = ["CLBackbone", "HATMaskBackbone", "mlp", "resnet"]

class CLBackbone(torch.nn.modules.module.Module): View Source

 18class CLBackbone(nn.Module):
 19    r"""The base class of continual learning backbone networks, inherited from `nn.Module`."""
 20
 21    def __init__(self, output_dim: int | None) -> None:
 22        r"""Initialise the CL backbone network.
 23
 24        **Args:**
 25        - **output_dim** (`int` | `None`): The output dimension which connects to CL output heads. The `input_dim` of output heads are expected to be the same as this `output_dim`. In some cases, this class is used for a block in the backbone network, which doesn't have the output dimension. In this case, it can be `None`.
 26        """
 27        nn.Module.__init__(self)
 28
 29        self.output_dim: int = output_dim
 30        r"""Store the output dimension of the backbone network."""
 31
 32        self.weighted_layer_names: list[str] = []
 33        r"""Maintain a list of the weighted layer names. Weighted layer has weights connecting to other weighted layer. They are the main part of neural networks. **It must be provided in subclasses.**
 34        
 35        The names are following the `nn.Module` internal naming mechanism. For example, if the a layer is assigned to `self.conv1`, the name becomes `conv1`. If the `nn.Sequential` is used, the name becomes the index of the layer in the sequence, such as `0`, `1`, etc. If hierarchical structure is used, for example, a `nn.Module` is assigned to `self.block` which has `self.conv1`, the name becomes `block/conv1`. Note that it should be `block.conv1` according to `nn.Module` internal mechanism, but we use '/' instead of '.' to avoid the error of using '.' in the key of `ModuleDict`.
 36        
 37        In HAT architecture, it's also the layer names with task embedding masking in the order of forward pass. HAT gives task embedding to every possible weighted layer. 
 38        """
 39
 40        self.task_id: int
 41        r"""Task ID counter indicating which task is being processed. Self updated during the task loop."""
 42
 43    def setup_task_id(self, task_id: int) -> None:
 44        r"""Set up which task's dataset the CL experiment is on. This must be done before `forward()` method is called.
 45
 46        **Args:**
 47        - **task_id** (`int`): the target task ID.
 48        """
 49        self.task_id = task_id
 50
 51    def get_layer_by_name(self, layer_name: str) -> nn.Module:
 52        r"""Get the layer by its name.
 53
 54        **Args:**
 55        - **layer_name** (`str`): the name of the layer. Note that the name is the name substituting the '.' with '/', like `block/conv1`, rather than `block.conv1`.
 56
 57        **Returns:**
 58        - **layer** (`nn.Module`): the layer.
 59        """
 60        for name, layer in self.named_modules():
 61            if name == layer_name.replace("/", "."):
 62                return layer
 63
 64    def preceding_layer_name(self, layer_name: str) -> str:
 65        r"""Get the name of the preceding layer of the given layer from the stored `self.masked_layer_order`. If the given layer is the first layer, return `None`.
 66
 67        **Args:**
 68        - **layer_name** (`str`): the name of the layer.
 69
 70        **Returns:**
 71        - **preceding_layer_name** (`str`): the name of the preceding layer.
 72
 73        **Raises:**
 74        - **ValueError**: if `layer_name` is not in the weighted layer order.
 75        """
 76
 77        if layer_name not in self.weighted_layer_names:
 78            raise ValueError(f"The layer name {layer_name} doesn't exist.")
 79
 80        weighted_layer_idx = self.weighted_layer_names.index(layer_name)
 81        if weighted_layer_idx == 0:
 82            return None
 83        return self.weighted_layer_names[weighted_layer_idx - 1]
 84
 85    @override  # since `nn.Module` uses it
 86    def forward(
 87        self,
 88        input: Tensor,
 89        stage: str,
 90        task_id: int | None = None,
 91    ) -> tuple[Tensor, dict[str, Tensor]]:
 92        r"""The forward pass for data from task `task_id`. In some backbones, the forward pass might be different for different tasks. **It must be implemented by subclasses.**
 93
 94        **Args:**
 95        - **input** (`Tensor`): The input tensor from data.
 96        - **stage** (`str`): the stage of the forward pass, should be one of the following:
 97            1. 'train': training stage.
 98            2. 'validation': validation stage.
 99            3. 'test': testing stage.
100        - **task_id** (`int` | `None`): the task ID where the data are from. If stage is 'train' or 'validation', it is usually from the current task `self.task_id`. If stage is 'test', it could be from any seen task. In TIL, the task IDs of test data are provided thus this argument can be used. In CIL, they are not provided, so it is just a placeholder for API consistence but never used, and best practices are not to provide this argument and leave it as the default value.
101
102        **Returns:**
103        - **output_feature** (`Tensor`): the output feature tensor to be passed into heads. This is the main target of backpropagation.
104        - **hidden_features** (`dict[str, Tensor]`): the hidden features (after activation) in each weighted layer. Key (`str`) is the weighted layer name, value (`Tensor`) is the hidden feature tensor. This is used for the continual learning algorithms that need to use the hidden features for various purposes.
105        """

The base class of continual learning backbone networks, inherited from nn.Module.

CLBackbone(output_dim: int | None) View Source

21    def __init__(self, output_dim: int | None) -> None:
22        r"""Initialise the CL backbone network.
23
24        **Args:**
25        - **output_dim** (`int` | `None`): The output dimension which connects to CL output heads. The `input_dim` of output heads are expected to be the same as this `output_dim`. In some cases, this class is used for a block in the backbone network, which doesn't have the output dimension. In this case, it can be `None`.
26        """
27        nn.Module.__init__(self)
28
29        self.output_dim: int = output_dim
30        r"""Store the output dimension of the backbone network."""
31
32        self.weighted_layer_names: list[str] = []
33        r"""Maintain a list of the weighted layer names. Weighted layer has weights connecting to other weighted layer. They are the main part of neural networks. **It must be provided in subclasses.**
34        
35        The names are following the `nn.Module` internal naming mechanism. For example, if the a layer is assigned to `self.conv1`, the name becomes `conv1`. If the `nn.Sequential` is used, the name becomes the index of the layer in the sequence, such as `0`, `1`, etc. If hierarchical structure is used, for example, a `nn.Module` is assigned to `self.block` which has `self.conv1`, the name becomes `block/conv1`. Note that it should be `block.conv1` according to `nn.Module` internal mechanism, but we use '/' instead of '.' to avoid the error of using '.' in the key of `ModuleDict`.
36        
37        In HAT architecture, it's also the layer names with task embedding masking in the order of forward pass. HAT gives task embedding to every possible weighted layer. 
38        """
39
40        self.task_id: int
41        r"""Task ID counter indicating which task is being processed. Self updated during the task loop."""

Initialise the CL backbone network.

Args:

output_dim (int | None): The output dimension which connects to CL output heads. The input_dim of output heads are expected to be the same as this output_dim. In some cases, this class is used for a block in the backbone network, which doesn't have the output dimension. In this case, it can be None.

output_dim: int

Store the output dimension of the backbone network.

weighted_layer_names: list[str]

Maintain a list of the weighted layer names. Weighted layer has weights connecting to other weighted layer. They are the main part of neural networks. It must be provided in subclasses.

The names are following the nn.Module internal naming mechanism. For example, if the a layer is assigned to self.conv1, the name becomes conv1. If the nn.Sequential is used, the name becomes the index of the layer in the sequence, such as 0, 1, etc. If hierarchical structure is used, for example, a nn.Module is assigned to self.block which has self.conv1, the name becomes block/conv1. Note that it should be block.conv1 according to nn.Module internal mechanism, but we use '/' instead of '.' to avoid the error of using '.' in the key of ModuleDict.

In HAT architecture, it's also the layer names with task embedding masking in the order of forward pass. HAT gives task embedding to every possible weighted layer.

task_id: int

Task ID counter indicating which task is being processed. Self updated during the task loop.

def setup_task_id(self, task_id: int) -> None: View Source

43    def setup_task_id(self, task_id: int) -> None:
44        r"""Set up which task's dataset the CL experiment is on. This must be done before `forward()` method is called.
45
46        **Args:**
47        - **task_id** (`int`): the target task ID.
48        """
49        self.task_id = task_id

Set up which task's dataset the CL experiment is on. This must be done before forward() method is called.

Args:

task_id (int): the target task ID.

def get_layer_by_name(self, layer_name: str) -> torch.nn.modules.module.Module: View Source

51    def get_layer_by_name(self, layer_name: str) -> nn.Module:
52        r"""Get the layer by its name.
53
54        **Args:**
55        - **layer_name** (`str`): the name of the layer. Note that the name is the name substituting the '.' with '/', like `block/conv1`, rather than `block.conv1`.
56
57        **Returns:**
58        - **layer** (`nn.Module`): the layer.
59        """
60        for name, layer in self.named_modules():
61            if name == layer_name.replace("/", "."):
62                return layer

Get the layer by its name.

Args:

layer_name (str): the name of the layer. Note that the name is the name substituting the '.' with '/', like block/conv1, rather than block.conv1.

Returns:

layer (nn.Module): the layer.

def preceding_layer_name(self, layer_name: str) -> str: View Source

64    def preceding_layer_name(self, layer_name: str) -> str:
65        r"""Get the name of the preceding layer of the given layer from the stored `self.masked_layer_order`. If the given layer is the first layer, return `None`.
66
67        **Args:**
68        - **layer_name** (`str`): the name of the layer.
69
70        **Returns:**
71        - **preceding_layer_name** (`str`): the name of the preceding layer.
72
73        **Raises:**
74        - **ValueError**: if `layer_name` is not in the weighted layer order.
75        """
76
77        if layer_name not in self.weighted_layer_names:
78            raise ValueError(f"The layer name {layer_name} doesn't exist.")
79
80        weighted_layer_idx = self.weighted_layer_names.index(layer_name)
81        if weighted_layer_idx == 0:
82            return None
83        return self.weighted_layer_names[weighted_layer_idx - 1]

Get the name of the preceding layer of the given layer from the stored self.masked_layer_order. If the given layer is the first layer, return None.

Args:

layer_name (str): the name of the layer.

Returns:

preceding_layer_name (str): the name of the preceding layer.

Raises:

ValueError: if layer_name is not in the weighted layer order.

@override

def forward( self, input: torch.Tensor, stage: str, task_id: int | None = None) -> tuple[torch.Tensor, dict[str, torch.Tensor]]: View Source

 85    @override  # since `nn.Module` uses it
 86    def forward(
 87        self,
 88        input: Tensor,
 89        stage: str,
 90        task_id: int | None = None,
 91    ) -> tuple[Tensor, dict[str, Tensor]]:
 92        r"""The forward pass for data from task `task_id`. In some backbones, the forward pass might be different for different tasks. **It must be implemented by subclasses.**
 93
 94        **Args:**
 95        - **input** (`Tensor`): The input tensor from data.
 96        - **stage** (`str`): the stage of the forward pass, should be one of the following:
 97            1. 'train': training stage.
 98            2. 'validation': validation stage.
 99            3. 'test': testing stage.
100        - **task_id** (`int` | `None`): the task ID where the data are from. If stage is 'train' or 'validation', it is usually from the current task `self.task_id`. If stage is 'test', it could be from any seen task. In TIL, the task IDs of test data are provided thus this argument can be used. In CIL, they are not provided, so it is just a placeholder for API consistence but never used, and best practices are not to provide this argument and leave it as the default value.
101
102        **Returns:**
103        - **output_feature** (`Tensor`): the output feature tensor to be passed into heads. This is the main target of backpropagation.
104        - **hidden_features** (`dict[str, Tensor]`): the hidden features (after activation) in each weighted layer. Key (`str`) is the weighted layer name, value (`Tensor`) is the hidden feature tensor. This is used for the continual learning algorithms that need to use the hidden features for various purposes.
105        """

The forward pass for data from task task_id. In some backbones, the forward pass might be different for different tasks. It must be implemented by subclasses.

Args:

input (Tensor): The input tensor from data.
stage (str): the stage of the forward pass, should be one of the following:
1. 'train': training stage.
2. 'validation': validation stage.
3. 'test': testing stage.
task_id (int | None): the task ID where the data are from. If stage is 'train' or 'validation', it is usually from the current task self.task_id. If stage is 'test', it could be from any seen task. In TIL, the task IDs of test data are provided thus this argument can be used. In CIL, they are not provided, so it is just a placeholder for API consistence but never used, and best practices are not to provide this argument and leave it as the default value.

Returns:

output_feature (Tensor): the output feature tensor to be passed into heads. This is the main target of backpropagation.
hidden_features (dict[str, Tensor]): the hidden features (after activation) in each weighted layer. Key (str) is the weighted layer name, value (Tensor) is the hidden feature tensor. This is used for the continual learning algorithms that need to use the hidden features for various purposes.