clarena.cl_algorithms.fgadahat API documentation

FGAdaHAT( backbone: clarena.backbones.HATMaskBackbone, heads: clarena.heads.HeadsTIL, adjustment_intensity: float, importance_type: str, importance_summing_strategy: str, importance_scheduler_type: str, neuron_to_weight_importance_aggregation_mode: str, s_max: float, clamp_threshold: float, mask_sparsity_reg_factor: float, mask_sparsity_reg_mode: str = 'original', base_importance: float = 0.01, base_mask_sparsity_reg: float = 0.1, base_linear: float = 10, filter_by_cumulative_mask: bool = False, filter_unmasked_importance: bool = True, step_multiply_training_mask: bool = True, task_embedding_init_mode: str = 'N01', importance_summing_strategy_linear_step: float | None = None, importance_summing_strategy_exponential_rate: float | None = None, importance_summing_strategy_log_base: float | None = None, non_algorithmic_hparams: dict[str, typing.Any] = {}) View Source

 45    def __init__(
 46        self,
 47        backbone: HATMaskBackbone,
 48        heads: HeadsTIL,
 49        adjustment_intensity: float,
 50        importance_type: str,
 51        importance_summing_strategy: str,
 52        importance_scheduler_type: str,
 53        neuron_to_weight_importance_aggregation_mode: str,
 54        s_max: float,
 55        clamp_threshold: float,
 56        mask_sparsity_reg_factor: float,
 57        mask_sparsity_reg_mode: str = "original",
 58        base_importance: float = 0.01,
 59        base_mask_sparsity_reg: float = 0.1,
 60        base_linear: float = 10,
 61        filter_by_cumulative_mask: bool = False,
 62        filter_unmasked_importance: bool = True,
 63        step_multiply_training_mask: bool = True,
 64        task_embedding_init_mode: str = "N01",
 65        importance_summing_strategy_linear_step: float | None = None,
 66        importance_summing_strategy_exponential_rate: float | None = None,
 67        importance_summing_strategy_log_base: float | None = None,
 68        non_algorithmic_hparams: dict[str, Any] = {},
 69    ) -> None:
 70        r"""Initialize the FG-AdaHAT algorithm with the network.
 71
 72        **Args:**
 73        - **backbone** (`HATMaskBackbone`): must be a backbone network with the HAT mask mechanism.
 74        - **heads** (`HeadsTIL`): output heads. FG-AdaHAT supports only TIL (Task-Incremental Learning).
 75        - **adjustment_intensity** (`float`): hyperparameter, controls the overall intensity of gradient adjustment (the $\alpha$ in the paper).
 76        - **importance_type** (`str`): the type of neuron-wise importance, must be one of:
 77            1. 'input_weight_abs_sum': sum of absolute input weights;
 78            2. 'output_weight_abs_sum': sum of absolute output weights;
 79            3. 'input_weight_gradient_abs_sum': sum of absolute gradients of the input weights (Input Gradients (IG) in the paper);
 80            4. 'output_weight_gradient_abs_sum': sum of absolute gradients of the output weights (Output Gradients (OG) in the paper);
 81            5. 'activation_abs': absolute activation;
 82            6. 'input_weight_abs_sum_x_activation_abs': sum of absolute input weights multiplied by absolute activation (Input Contribution Utility (ICU) in the paper);
 83            7. 'output_weight_abs_sum_x_activation_abs': sum of absolute output weights multiplied by absolute activation (Contribution Utility (CU) in the paper);
 84            8. 'gradient_x_activation_abs': absolute gradient (the saliency) multiplied by activation;
 85            9. 'input_weight_gradient_square_sum': sum of squared gradients of the input weights;
 86            10. 'output_weight_gradient_square_sum': sum of squared gradients of the output weights;
 87            11. 'input_weight_gradient_square_sum_x_activation_abs': sum of squared gradients of the input weights multiplied by absolute activation (Activation Fisher Information (AFI) in the paper);
 88            12. 'output_weight_gradient_square_sum_x_activation_abs': sum of squared gradients of the output weights multiplied by absolute activation;
 89            13. 'conductance_abs': absolute layer conductance;
 90            14. 'internal_influence_abs': absolute internal influence (Internal Influence (II) in the paper);
 91            15. 'gradcam_abs': absolute Grad-CAM;
 92            16. 'deeplift_abs': absolute DeepLIFT (DeepLIFT (DL) in the paper);
 93            17. 'deepliftshap_abs': absolute DeepLIFT-SHAP;
 94            18. 'gradientshap_abs': absolute Gradient-SHAP (Gradient SHAP (GS) in the paper);
 95            19. 'integrated_gradients_abs': absolute Integrated Gradients;
 96            20. 'feature_ablation_abs': absolute Feature Ablation (Feature Ablation (FA) in the paper);
 97            21. 'lrp_abs': absolute Layer-wise Relevance Propagation (LRP);
 98            22. 'cbp_adaptation': the adaptation function in [Continual Backpropagation (CBP)](https://www.nature.com/articles/s41586-024-07711-7);
 99            23. 'cbp_adaptive_contribution': the adaptive contribution function in [Continual Backpropagation (CBP)](https://www.nature.com/articles/s41586-024-07711-7);
100        - **importance_summing_strategy** (`str`): the strategy to sum neuron-wise importance for previous tasks, must be one of:
101            1. 'add_latest': add the latest neuron-wise importance to the summative importance;
102            2. 'add_all': add all previous neuron-wise importance (including the latest) to the summative importance;
103            3. 'add_average': add the average of all previous neuron-wise importance (including the latest) to the summative importance;
104            4. 'linear_decrease': weigh the previous neuron-wise importance by a linear factor that decreases with the task ID;
105            5. 'quadratic_decrease': weigh the previous neuron-wise importance that decreases quadratically with the task ID;
106            6. 'cubic_decrease': weigh the previous neuron-wise importance that decreases cubically with the task ID;
107            7. 'exponential_decrease': weigh the previous neuron-wise importance by an exponential factor that decreases with the task ID;
108            8. 'log_decrease': weigh the previous neuron-wise importance by a logarithmic factor that decreases with the task ID;
109            9. 'factorial_decrease': weigh the previous neuron-wise importance that decreases factorially with the task ID;
110        - **importance_scheduler_type** (`str`): the scheduler for importance, i.e., the factor $c^t$ multiplied to parameter importance. Must be one of:
111            1. 'linear_sparsity_reg': $c^t = (t+b_L) \cdot [R(M^t, M^{<t}) + b_R]$, where $R(M^t, M^{<t})$ is the mask sparsity regularization betwwen the current task and previous tasks, $b_L$ is the base linear factor (see argument `base_linear`), and $b_R$ is the base mask sparsity regularization factor (see argument `base_mask_sparsity_reg`);
112            2. 'sparsity_reg': $c^t = [R(M^t, M^{<t}) + b_R]$;
113            3. 'summative_mask_sparsity_reg': $c^t_{l,ij} = \left(\min \left(m^{<t, \text{sum}}_{l,i}, m^{<t, \text{sum}}_{l-1,j}\right)+b_L\right) \cdot [R(M^t, M^{<t}) + b_R]$.
114        - **neuron_to_weight_importance_aggregation_mode** (`str`): aggregation mode from neuron-wise to weight-wise importance ($\text{Agg}(\cdot)$ in the paper), must be one of:
115            1. 'min': take the minimum of neuron-wise importance for each weight;
116            2. 'max': take the maximum of neuron-wise importance for each weight;
117            3. 'mean': take the mean of neuron-wise importance for each weight.
118        - **s_max** (`float`): hyperparameter, the maximum scaling factor in the gate function. See Sec. 2.4 "Hard Attention Training" in the [HAT paper](http://proceedings.mlr.press/v80/serra18a).
119        - **clamp_threshold** (`float`): the threshold for task embedding gradient compensation. See Sec. 2.5 "Embedding Gradient Compensation" in the [HAT paper](http://proceedings.mlr.press/v80/serra18a).
120        - **mask_sparsity_reg_factor** (`float`): hyperparameter, the regularization factor for mask sparsity.
121        - **mask_sparsity_reg_mode** (`str`): the mode of mask sparsity regularization, must be one of:
122            1. 'original' (default): the original mask sparsity regularization in the [HAT paper](http://proceedings.mlr.press/v80/serra18a).
123            2. 'cross': the cross version of mask sparsity regularization.
124        - **base_importance** (`float`): base value added to importance ($b_I$ in the paper). Default: 0.01.
125        - **base_mask_sparsity_reg** (`float`): base value added to mask sparsity regularization factor in the importance scheduler ($b_R$ in the paper). Default: 0.1.
126        - **base_linear** (`float`): base value added to the linear factor in the importance scheduler ($b_L$ in the paper). Default: 10.
127        - **filter_by_cumulative_mask** (`bool`): whether to multiply the cumulative mask to the importance when calculating adjustment rate. Default: False.
128        - **filter_unmasked_importance** (`bool`): whether to filter unmasked importance values (set to 0) at the end of task training. Default: False.
129        - **step_multiply_training_mask** (`bool`): whether to multiply the training mask to the importance at each training step. Default: True.
130        - **task_embedding_init_mode** (`str`): the initialization mode for task embeddings, must be one of:
131            1. 'N01' (default): standard normal distribution $N(0, 1)$.
132            2. 'U-11': uniform distribution $U(-1, 1)$.
133            3. 'U01': uniform distribution $U(0, 1)$.
134            4. 'U-10': uniform distribution $U(-1, 0)$.
135            5. 'last': inherit the task embedding from the last task.
136        - **importance_summing_strategy_linear_step** (`float` | `None`): linear step for the importance summing strategy (used when `importance_summing_strategy` is 'linear_decrease'). Must be > 0.
137        - **importance_summing_strategy_exponential_rate** (`float` | `None`): exponential rate for the importance summing strategy (used when `importance_summing_strategy` is 'exponential_decrease'). Must be > 1.
138        - **importance_summing_strategy_log_base** (`float` | `None`): base for the logarithm in the importance summing strategy (used when `importance_summing_strategy` is 'log_decrease'). Must be > 1.
139        - **non_algorithmic_hparams** (`dict[str, Any]`): non-algorithmic hyperparameters that are not related to the algorithm itself are passed to this `LightningModule` object from the config, such as optimizer and learning rate scheduler configurations. They are saved for Lightning APIs from `save_hyperparameters()` method. This is useful for the experiment configuration and reproducibility.
140
141        """
142        super().__init__(
143            backbone=backbone,
144            heads=heads,
145            adjustment_mode=None,  # use the own adjustment mechanism of FG-AdaHAT
146            adjustment_intensity=adjustment_intensity,
147            s_max=s_max,
148            clamp_threshold=clamp_threshold,
149            mask_sparsity_reg_factor=mask_sparsity_reg_factor,
150            mask_sparsity_reg_mode=mask_sparsity_reg_mode,
151            task_embedding_init_mode=task_embedding_init_mode,
152            epsilon=base_mask_sparsity_reg,  # the epsilon is now the base mask sparsity regularization factor
153            non_algorithmic_hparams=non_algorithmic_hparams,
154        )
155
156        # save additional algorithmic hyperparameters
157        self.save_hyperparameters(
158            "adjustment_intensity",
159            "importance_type",
160            "importance_summing_strategy",
161            "importance_scheduler_type",
162            "neuron_to_weight_importance_aggregation_mode",
163            "s_max",
164            "clamp_threshold",
165            "mask_sparsity_reg_factor",
166            "mask_sparsity_reg_mode",
167            "base_importance",
168            "base_mask_sparsity_reg",
169            "base_linear",
170            "filter_by_cumulative_mask",
171            "filter_unmasked_importance",
172            "step_multiply_training_mask",
173        )
174
175        self.importance_type: str | None = importance_type
176        r"""The type of the neuron-wise importance added to AdaHAT importance."""
177
178        self.importance_scheduler_type: str = importance_scheduler_type
179        r"""The type of the importance scheduler."""
180        self.neuron_to_weight_importance_aggregation_mode: str = (
181            neuron_to_weight_importance_aggregation_mode
182        )
183        r"""The mode of aggregation from neuron-wise to weight-wise importance. """
184        self.filter_by_cumulative_mask: bool = filter_by_cumulative_mask
185        r"""The flag to filter importance by the cumulative mask when calculating the adjustment rate."""
186        self.filter_unmasked_importance: bool = filter_unmasked_importance
187        r"""The flag to filter unmasked importance values (set them to 0) at the end of task training."""
188        self.step_multiply_training_mask: bool = step_multiply_training_mask
189        r"""The flag to multiply the training mask to the importance at each training step."""
190
191        # importance summing strategy
192        self.importance_summing_strategy: str = importance_summing_strategy
193        r"""The strategy to sum the neuron-wise importance for previous tasks."""
194        if importance_summing_strategy_linear_step is not None:
195            self.importance_summing_strategy_linear_step: float = (
196                importance_summing_strategy_linear_step
197            )
198            r"""The linear step for the importance summing strategy (only when `importance_summing_strategy` is 'linear_decrease')."""
199        if importance_summing_strategy_exponential_rate is not None:
200            self.importance_summing_strategy_exponential_rate: float = (
201                importance_summing_strategy_exponential_rate
202            )
203            r"""The exponential rate for the importance summing strategy (only when `importance_summing_strategy` is 'exponential_decrease'). """
204        if importance_summing_strategy_log_base is not None:
205            self.importance_summing_strategy_log_base: float = (
206                importance_summing_strategy_log_base
207            )
208            r"""The base for the logarithm in the importance summing strategy (only when `importance_summing_strategy` is 'log_decrease'). """
209
210        # base values
211        self.base_importance: float = base_importance
212        r"""The base value added to the importance to avoid zero. """
213        self.base_mask_sparsity_reg: float = base_mask_sparsity_reg
214        r"""The base value added to the mask sparsity regularization to avoid zero. """
215        self.base_linear: float = base_linear
216        r"""The base value added to the linear layer to avoid zero. """
217
218        self.importances: dict[int, dict[str, Tensor]] = {}
219        r"""The min-max scaled ($[0, 1]$) neuron-wise importance of units. It is $I^{\tau}_{l}$ in the paper. Keys are task IDs and values are the corresponding importance tensors. Each importance tensor is a dict where keys are layer names and values are the importance tensor for the layer. The utility tensor is the same size as the feature tensor with size (number of units, ). """
220        self.summative_importance_for_previous_tasks: dict[str, Tensor] = {}
221        r"""The summative neuron-wise importance values of units for previous tasks before the current task `self.task_id`. See $I^{<t}_{l}$ in the paper. Keys are layer names and values are the summative importance tensor for the layer. The summative importance tensor has the same size as the feature tensor with size (number of units, ). """
222
223        self.num_steps_t: int
224        r"""The number of training steps for the current task `self.task_id`."""
225        # set manual optimization
226        self.automatic_optimization = False
227
228        FGAdaHAT.sanity_check(self)

Initialize the FG-AdaHAT algorithm with the network.

Args:

backbone (HATMaskBackbone): must be a backbone network with the HAT mask mechanism.
heads (HeadsTIL): output heads. FG-AdaHAT supports only TIL (Task-Incremental Learning).
adjustment_intensity (float): hyperparameter, controls the overall intensity of gradient adjustment (the $\alpha$ in the paper).
importance_type (str): the type of neuron-wise importance, must be one of:
1. 'input_weight_abs_sum': sum of absolute input weights;
2. 'output_weight_abs_sum': sum of absolute output weights;
3. 'input_weight_gradient_abs_sum': sum of absolute gradients of the input weights (Input Gradients (IG) in the paper);
4. 'output_weight_gradient_abs_sum': sum of absolute gradients of the output weights (Output Gradients (OG) in the paper);
5. 'activation_abs': absolute activation;
6. 'input_weight_abs_sum_x_activation_abs': sum of absolute input weights multiplied by absolute activation (Input Contribution Utility (ICU) in the paper);
7. 'output_weight_abs_sum_x_activation_abs': sum of absolute output weights multiplied by absolute activation (Contribution Utility (CU) in the paper);
8. 'gradient_x_activation_abs': absolute gradient (the saliency) multiplied by activation;
9. 'input_weight_gradient_square_sum': sum of squared gradients of the input weights;
10. 'output_weight_gradient_square_sum': sum of squared gradients of the output weights;
11. 'input_weight_gradient_square_sum_x_activation_abs': sum of squared gradients of the input weights multiplied by absolute activation (Activation Fisher Information (AFI) in the paper);
12. 'output_weight_gradient_square_sum_x_activation_abs': sum of squared gradients of the output weights multiplied by absolute activation;
13. 'conductance_abs': absolute layer conductance;
14. 'internal_influence_abs': absolute internal influence (Internal Influence (II) in the paper);
15. 'gradcam_abs': absolute Grad-CAM;
16. 'deeplift_abs': absolute DeepLIFT (DeepLIFT (DL) in the paper);
17. 'deepliftshap_abs': absolute DeepLIFT-SHAP;
18. 'gradientshap_abs': absolute Gradient-SHAP (Gradient SHAP (GS) in the paper);
19. 'integrated_gradients_abs': absolute Integrated Gradients;
20. 'feature_ablation_abs': absolute Feature Ablation (Feature Ablation (FA) in the paper);
21. 'lrp_abs': absolute Layer-wise Relevance Propagation (LRP);
22. 'cbp_adaptation': the adaptation function in Continual Backpropagation (CBP);
23. 'cbp_adaptive_contribution': the adaptive contribution function in Continual Backpropagation (CBP);
importance_summing_strategy (str): the strategy to sum neuron-wise importance for previous tasks, must be one of:
1. 'add_latest': add the latest neuron-wise importance to the summative importance;
2. 'add_all': add all previous neuron-wise importance (including the latest) to the summative importance;
3. 'add_average': add the average of all previous neuron-wise importance (including the latest) to the summative importance;
4. 'linear_decrease': weigh the previous neuron-wise importance by a linear factor that decreases with the task ID;
5. 'quadratic_decrease': weigh the previous neuron-wise importance that decreases quadratically with the task ID;
6. 'cubic_decrease': weigh the previous neuron-wise importance that decreases cubically with the task ID;
7. 'exponential_decrease': weigh the previous neuron-wise importance by an exponential factor that decreases with the task ID;
8. 'log_decrease': weigh the previous neuron-wise importance by a logarithmic factor that decreases with the task ID;
9. 'factorial_decrease': weigh the previous neuron-wise importance that decreases factorially with the task ID;
importance_scheduler_type (str): the scheduler for importance, i.e., the factor $c^t$ multiplied to parameter importance. Must be one of:
1. 'linear_sparsity_reg': $c^t = (t+b_L) \cdot [R(M^t, M^{base_linear), and $b_R$ is the base mask sparsity regularization factor (see argument base_mask_sparsity_reg);
2. 'sparsity_reg': $c^t = [R(M^t, M^{
3. 'summative_mask_sparsity_reg': $c^t_{l,ij} = \left(\min \left(m^{
neuron_to_weight_importance_aggregation_mode (str): aggregation mode from neuron-wise to weight-wise importance ($\text{Agg}(\cdot)$ in the paper), must be one of:
1. 'min': take the minimum of neuron-wise importance for each weight;
2. 'max': take the maximum of neuron-wise importance for each weight;
3. 'mean': take the mean of neuron-wise importance for each weight.
s_max (float): hyperparameter, the maximum scaling factor in the gate function. See Sec. 2.4 "Hard Attention Training" in the HAT paper.
clamp_threshold (float): the threshold for task embedding gradient compensation. See Sec. 2.5 "Embedding Gradient Compensation" in the HAT paper.
mask_sparsity_reg_factor (float): hyperparameter, the regularization factor for mask sparsity.
mask_sparsity_reg_mode (str): the mode of mask sparsity regularization, must be one of:
1. 'original' (default): the original mask sparsity regularization in the HAT paper.
2. 'cross': the cross version of mask sparsity regularization.
base_importance (float): base value added to importance ($b_I$ in the paper). Default: 0.01.
base_mask_sparsity_reg (float): base value added to mask sparsity regularization factor in the importance scheduler ($b_R$ in the paper). Default: 0.1.
base_linear (float): base value added to the linear factor in the importance scheduler ($b_L$ in the paper). Default: 10.
filter_by_cumulative_mask (bool): whether to multiply the cumulative mask to the importance when calculating adjustment rate. Default: False.
filter_unmasked_importance (bool): whether to filter unmasked importance values (set to 0) at the end of task training. Default: False.
step_multiply_training_mask (bool): whether to multiply the training mask to the importance at each training step. Default: True.
task_embedding_init_mode (str): the initialization mode for task embeddings, must be one of:
1. 'N01' (default): standard normal distribution $N(0, 1)$.
2. 'U-11': uniform distribution $U(-1, 1)$.
3. 'U01': uniform distribution $U(0, 1)$.
4. 'U-10': uniform distribution $U(-1, 0)$.
5. 'last': inherit the task embedding from the last task.
importance_summing_strategy_linear_step (float | None): linear step for the importance summing strategy (used when importance_summing_strategy is 'linear_decrease'). Must be > 0.
importance_summing_strategy_exponential_rate (float | None): exponential rate for the importance summing strategy (used when importance_summing_strategy is 'exponential_decrease'). Must be > 1.
importance_summing_strategy_log_base (float | None): base for the logarithm in the importance summing strategy (used when importance_summing_strategy is 'log_decrease'). Must be > 1.
non_algorithmic_hparams (dict[str, Any]): non-algorithmic hyperparameters that are not related to the algorithm itself are passed to this LightningModule object from the config, such as optimizer and learning rate scheduler configurations. They are saved for Lightning APIs from save_hyperparameters() method. This is useful for the experiment configuration and reproducibility.

importance_type: str | None

The type of the neuron-wise importance added to AdaHAT importance.

importance_scheduler_type: str

The type of the importance scheduler.

neuron_to_weight_importance_aggregation_mode: str

The mode of aggregation from neuron-wise to weight-wise importance.

filter_by_cumulative_mask: bool

The flag to filter importance by the cumulative mask when calculating the adjustment rate.

filter_unmasked_importance: bool

The flag to filter unmasked importance values (set them to 0) at the end of task training.

step_multiply_training_mask: bool

The flag to multiply the training mask to the importance at each training step.

importance_summing_strategy: str

The strategy to sum the neuron-wise importance for previous tasks.

base_importance: float

The base value added to the importance to avoid zero.

base_mask_sparsity_reg: float

The base value added to the mask sparsity regularization to avoid zero.

base_linear: float

The base value added to the linear layer to avoid zero.

importances: dict[int, dict[str, torch.Tensor]]

The min-max scaled ($[0, 1]$) neuron-wise importance of units. It is $I^{\tau}_{l}$ in the paper. Keys are task IDs and values are the corresponding importance tensors. Each importance tensor is a dict where keys are layer names and values are the importance tensor for the layer. The utility tensor is the same size as the feature tensor with size (number of units, ).

summative_importance_for_previous_tasks: dict[str, torch.Tensor]

The summative neuron-wise importance values of units for previous tasks before the current task self.task_id. See $I^{

num_steps_t: int

The number of training steps for the current task self.task_id.

automatic_optimization: bool View Source

290    @property
291    def automatic_optimization(self) -> bool:
292        """If set to ``False`` you are responsible for calling ``.backward()``, ``.step()``, ``.zero_grad()``."""
293        return self._automatic_optimization

If set to False you are responsible for calling .backward(), .step(), .zero_grad().

def sanity_check(self) -> None: View Source

230    def sanity_check(self) -> None:
231        r"""Sanity check."""
232
233        # check importance type
234        if self.importance_type not in [
235            "input_weight_abs_sum",
236            "output_weight_abs_sum",
237            "input_weight_gradient_abs_sum",
238            "output_weight_gradient_abs_sum",
239            "activation_abs",
240            "input_weight_abs_sum_x_activation_abs",
241            "output_weight_abs_sum_x_activation_abs",
242            "gradient_x_activation_abs",
243            "input_weight_gradient_square_sum",
244            "output_weight_gradient_square_sum",
245            "input_weight_gradient_square_sum_x_activation_abs",
246            "output_weight_gradient_square_sum_x_activation_abs",
247            "conductance_abs",
248            "internal_influence_abs",
249            "gradcam_abs",
250            "deeplift_abs",
251            "deepliftshap_abs",
252            "gradientshap_abs",
253            "integrated_gradients_abs",
254            "feature_ablation_abs",
255            "lrp_abs",
256            "cbp_adaptation",
257            "cbp_adaptive_contribution",
258        ]:
259            raise ValueError(
260                f"importance_type must be one of the predefined types, but got {self.importance_type}"
261            )
262
263        # check importance summing strategy
264        if self.importance_summing_strategy not in [
265            "add_latest",
266            "add_all",
267            "add_average",
268            "linear_decrease",
269            "quadratic_decrease",
270            "cubic_decrease",
271            "exponential_decrease",
272            "log_decrease",
273            "factorial_decrease",
274        ]:
275            raise ValueError(
276                f"importance_summing_strategy must be one of the predefined strategies, but got {self.importance_summing_strategy}"
277            )
278
279        # check importance scheduler type
280        if self.importance_scheduler_type not in [
281            "linear_sparsity_reg",
282            "sparsity_reg",
283            "summative_mask_sparsity_reg",
284        ]:
285            raise ValueError(
286                f"importance_scheduler_type must be one of the predefined types, but got {self.importance_scheduler_type}"
287            )
288
289        # check neuron to weight importance aggregation mode
290        if self.neuron_to_weight_importance_aggregation_mode not in [
291            "min",
292            "max",
293            "mean",
294        ]:
295            raise ValueError(
296                f"neuron_to_weight_importance_aggregation_mode must be one of the predefined modes, but got {self.neuron_to_weight_importance_aggregation_mode}"
297            )
298
299        # check base values
300        if self.base_importance < 0:
301            raise ValueError(
302                f"base_importance must be >= 0, but got {self.base_importance}"
303            )
304        if self.base_mask_sparsity_reg <= 0:
305            raise ValueError(
306                f"base_mask_sparsity_reg must be > 0, but got {self.base_mask_sparsity_reg}"
307            )
308        if self.base_linear <= 0:
309            raise ValueError(f"base_linear must be > 0, but got {self.base_linear}")

Sanity check.

def on_train_start(self) -> None: View Source

311    def on_train_start(self) -> None:
312        r"""Initialize neuron importance accumulation variable for each layer as zeros, in addition to AdaHAT's summative mask initialization."""
313        super().on_train_start()
314
315        self.importances[self.task_id] = (
316            {}
317        )  # initialize the importance for the current task
318
319        # initialize the neuron importance at the beginning of each task. This should not be called in `__init__()` method because `self.device` is not available at that time.
320        for layer_name in self.backbone.weighted_layer_names:
321            layer = self.backbone.get_layer_by_name(
322                layer_name
323            )  # get the layer by its name
324            num_units = layer.weight.shape[0]
325
326            # initialize the accumulated importance at the beginning of each task
327            self.importances[self.task_id][layer_name] = torch.zeros(num_units).to(
328                self.device
329            )
330
331            # reset the number of steps counter for the current task
332            self.num_steps_t = 0
333
334            # initialize the summative neuron-wise importance at the beginning of the first task
335            if self.task_id == 1:
336                self.summative_importance_for_previous_tasks[layer_name] = torch.zeros(
337                    num_units
338                ).to(
339                    self.device
340                )  # the summative neuron-wise importance for previous tasks $I^{<t}_{l}$ is initialized as zeros mask when $t=1$

Initialize neuron importance accumulation variable for each layer as zeros, in addition to AdaHAT's summative mask initialization.

def clip_grad_by_adjustment( self, network_sparsity: dict[str, torch.Tensor]) -> tuple[dict[str, torch.Tensor], dict[str, torch.Tensor], torch.Tensor]: View Source

342    def clip_grad_by_adjustment(
343        self,
344        network_sparsity: dict[str, Tensor],
345    ) -> tuple[dict[str, Tensor], dict[str, Tensor], Tensor]:
346        r"""Clip the gradients by the adjustment rate. See Eq. (1) in the paper.
347
348        Note that because the task embedding fully covers every layer in the backbone network, no parameters are left out of this system. This applies not only to parameters between layers with task embeddings, but also to those before the first layer. We design it separately in the code.
349
350        Network capacity is measured alongside this method. Network capacity is defined as the average adjustment rate over all parameters. See Sec. 4.1 in the [AdaHAT paper](https://link.springer.com/chapter/10.1007/978-3-031-70352-2_9).
351
352        **Args:**
353        - **network_sparsity** (`dict[str, Tensor]`): the network sparsity (i.e., mask sparsity loss of each layer) for the current task. Keys are layer names and values are the network sparsity values. It is used to calculate the adjustment rate for gradients. In FG-AdaHAT, it is used to construct the importance scheduler.
354
355        **Returns:**
356        - **adjustment_rate_weight** (`dict[str, Tensor]`): the adjustment rate for weights. Keys (`str`) are layer names and values (`Tensor`) are the adjustment rate tensors.
357        - **adjustment_rate_bias** (`dict[str, Tensor]`): the adjustment rate for biases. Keys (`str`) are layer names and values (`Tensor`) are the adjustment rate tensors.
358        - **capacity** (`Tensor`): the calculated network capacity.
359        """
360
361        # initialize network capacity metric
362        capacity = HATNetworkCapacityMetric().to(self.device)
363        adjustment_rate_weight = {}
364        adjustment_rate_bias = {}
365
366        # calculate the adjustment rate for gradients of the parameters, both weights and biases (if they exist). See Eq. (2) in the paper
367        for layer_name in self.backbone.weighted_layer_names:
368
369            layer = self.backbone.get_layer_by_name(
370                layer_name
371            )  # get the layer by its name
372
373            # placeholder for the adjustment rate to avoid the error of using it before assignment
374            adjustment_rate_weight_layer = 1
375            adjustment_rate_bias_layer = 1
376
377            # aggregate the neuron-wise importance to weight-wise importance. Note that the neuron-wise importance has already been min-max scaled to $[0, 1]$ in the `on_train_batch_end()` method, added the base value, and filtered by the mask
378            weight_importance, bias_importance = (
379                self.backbone.get_layer_measure_parameter_wise(
380                    neuron_wise_measure=self.summative_importance_for_previous_tasks,
381                    layer_name=layer_name,
382                    aggregation_mode=self.neuron_to_weight_importance_aggregation_mode,
383                )
384            )
385
386            weight_mask, bias_mask = self.backbone.get_layer_measure_parameter_wise(
387                neuron_wise_measure=self.cumulative_mask_for_previous_tasks,
388                layer_name=layer_name,
389                aggregation_mode="min",
390            )
391
392            # filter the weight importance by the cumulative mask
393            if self.filter_by_cumulative_mask:
394                weight_importance = weight_importance * weight_mask
395                bias_importance = bias_importance * bias_mask
396
397            network_sparsity_layer = network_sparsity[layer_name]
398
399            # calculate importance scheduler (the factor of importance). See Eq. (3) in the paper
400            factor = network_sparsity_layer + self.base_mask_sparsity_reg
401            if self.importance_scheduler_type == "linear_sparsity_reg":
402                factor = factor * (self.task_id + self.base_linear)
403            elif self.importance_scheduler_type == "sparsity_reg":
404                pass
405            elif self.importance_scheduler_type == "summative_mask_sparsity_reg":
406                factor = factor * (
407                    self.summative_mask_for_previous_tasks + self.base_linear
408                )
409
410            # calculate the adjustment rate
411            adjustment_rate_weight_layer = torch.div(
412                self.adjustment_intensity,
413                (factor * weight_importance + self.adjustment_intensity),
414            )
415
416            adjustment_rate_bias_layer = torch.div(
417                self.adjustment_intensity,
418                (factor * bias_importance + self.adjustment_intensity),
419            )
420
421            # apply the adjustment rate to the gradients
422            layer.weight.grad.data *= adjustment_rate_weight_layer
423            if layer.bias is not None:
424                layer.bias.grad.data *= adjustment_rate_bias_layer
425
426            # store the adjustment rate for logging
427            adjustment_rate_weight[layer_name] = adjustment_rate_weight_layer
428            if layer.bias is not None:
429                adjustment_rate_bias[layer_name] = adjustment_rate_bias_layer
430
431            # update network capacity metric
432            capacity.update(adjustment_rate_weight_layer, adjustment_rate_bias_layer)
433
434        return adjustment_rate_weight, adjustment_rate_bias, capacity.compute()

Clip the gradients by the adjustment rate. See Eq. (1) in the paper.

Note that because the task embedding fully covers every layer in the backbone network, no parameters are left out of this system. This applies not only to parameters between layers with task embeddings, but also to those before the first layer. We design it separately in the code.

Network capacity is measured alongside this method. Network capacity is defined as the average adjustment rate over all parameters. See Sec. 4.1 in the AdaHAT paper.

Args:

network_sparsity (dict[str, Tensor]): the network sparsity (i.e., mask sparsity loss of each layer) for the current task. Keys are layer names and values are the network sparsity values. It is used to calculate the adjustment rate for gradients. In FG-AdaHAT, it is used to construct the importance scheduler.

Returns:

adjustment_rate_weight (dict[str, Tensor]): the adjustment rate for weights. Keys (str) are layer names and values (Tensor) are the adjustment rate tensors.
adjustment_rate_bias (dict[str, Tensor]): the adjustment rate for biases. Keys (str) are layer names and values (Tensor) are the adjustment rate tensors.
capacity (Tensor): the calculated network capacity.

def on_train_batch_end(self, outputs: dict[str, typing.Any], batch: Any, batch_idx: int) -> None: View Source

436    def on_train_batch_end(
437        self, outputs: dict[str, Any], batch: Any, batch_idx: int
438    ) -> None:
439        r"""Calculate the step-wise importance, update the accumulated importance and number of steps counter after each training step.
440
441        **Args:**
442        - **outputs** (`dict[str, Any]`): outputs of the training step (returns of `training_step()` in `CLAlgorithm`).
443        - **batch** (`Any`): training data batch.
444        - **batch_idx** (`int`): index of the current batch (for mask figure file name).
445        """
446
447        # get potential useful information from training batch
448        activations = outputs["activations"]
449        input = outputs["input"]
450        target = outputs["target"]
451        mask = outputs["mask"]
452        num_batches = self.trainer.num_training_batches
453
454        for layer_name in self.backbone.weighted_layer_names:
455            # layer-wise operation
456
457            activation = activations[layer_name]
458
459            # calculate neuron-wise importance of the training step. See $I^{\tau}_l(\mathbf{x},y)$ (before Eqs. (5) and (6)) in the paper.
460            if self.importance_type == "input_weight_abs_sum":
461                importance_step = self.get_importance_step_layer_weight_abs_sum(
462                    layer_name=layer_name,
463                    if_output_weight=False,
464                    reciprocal=False,
465                )
466            elif self.importance_type == "output_weight_abs_sum":
467                importance_step = self.get_importance_step_layer_weight_abs_sum(
468                    layer_name=layer_name,
469                    if_output_weight=True,
470                    reciprocal=False,
471                )
472            elif self.importance_type == "input_weight_gradient_abs_sum":
473                importance_step = (
474                    self.get_importance_step_layer_weight_gradient_abs_sum(
475                        layer_name=layer_name, if_output_weight=False
476                    )
477                )
478            elif self.importance_type == "output_weight_gradient_abs_sum":
479                importance_step = (
480                    self.get_importance_step_layer_weight_gradient_abs_sum(
481                        layer_name=layer_name, if_output_weight=True
482                    )
483                )
484            elif self.importance_type == "activation_abs":
485                importance_step = self.get_importance_step_layer_activation_abs(
486                    activation=activation
487                )
488            elif self.importance_type == "input_weight_abs_sum_x_activation_abs":
489                importance_step = (
490                    self.get_importance_step_layer_weight_abs_sum_x_activation_abs(
491                        layer_name=layer_name,
492                        activation=activation,
493                        if_output_weight=False,
494                    )
495                )
496            elif self.importance_type == "output_weight_abs_sum_x_activation_abs":
497                importance_step = (
498                    self.get_importance_step_layer_weight_abs_sum_x_activation_abs(
499                        layer_name=layer_name,
500                        activation=activation,
501                        if_output_weight=True,
502                    )
503                )
504            elif self.importance_type == "gradient_x_activation_abs":
505                importance_step = (
506                    self.get_importance_step_layer_gradient_x_activation_abs(
507                        layer_name=layer_name,
508                        input=input,
509                        target=target,
510                        batch_idx=batch_idx,
511                        num_batches=num_batches,
512                    )
513                )
514            elif self.importance_type == "input_weight_gradient_square_sum":
515                importance_step = (
516                    self.get_importance_step_layer_weight_gradient_square_sum(
517                        layer_name=layer_name,
518                        activation=activation,
519                        if_output_weight=False,
520                    )
521                )
522            elif self.importance_type == "output_weight_gradient_square_sum":
523                importance_step = (
524                    self.get_importance_step_layer_weight_gradient_square_sum(
525                        layer_name=layer_name,
526                        activation=activation,
527                        if_output_weight=True,
528                    )
529                )
530            elif (
531                self.importance_type
532                == "input_weight_gradient_square_sum_x_activation_abs"
533            ):
534                importance_step = self.get_importance_step_layer_weight_gradient_square_sum_x_activation_abs(
535                    layer_name=layer_name,
536                    activation=activation,
537                    if_output_weight=False,
538                )
539            elif (
540                self.importance_type
541                == "output_weight_gradient_square_sum_x_activation_abs"
542            ):
543                importance_step = self.get_importance_step_layer_weight_gradient_square_sum_x_activation_abs(
544                    layer_name=layer_name,
545                    activation=activation,
546                    if_output_weight=True,
547                )
548            elif self.importance_type == "conductance_abs":
549                importance_step = self.get_importance_step_layer_conductance_abs(
550                    layer_name=layer_name,
551                    input=input,
552                    baselines=None,
553                    target=target,
554                    batch_idx=batch_idx,
555                    num_batches=num_batches,
556                )
557            elif self.importance_type == "internal_influence_abs":
558                importance_step = self.get_importance_step_layer_internal_influence_abs(
559                    layer_name=layer_name,
560                    input=input,
561                    baselines=None,
562                    target=target,
563                    batch_idx=batch_idx,
564                    num_batches=num_batches,
565                )
566            elif self.importance_type == "gradcam_abs":
567                importance_step = self.get_importance_step_layer_gradcam_abs(
568                    layer_name=layer_name,
569                    input=input,
570                    target=target,
571                    batch_idx=batch_idx,
572                    num_batches=num_batches,
573                )
574            elif self.importance_type == "deeplift_abs":
575                importance_step = self.get_importance_step_layer_deeplift_abs(
576                    layer_name=layer_name,
577                    input=input,
578                    baselines=None,
579                    target=target,
580                    batch_idx=batch_idx,
581                    num_batches=num_batches,
582                )
583            elif self.importance_type == "deepliftshap_abs":
584                importance_step = self.get_importance_step_layer_deepliftshap_abs(
585                    layer_name=layer_name,
586                    input=input,
587                    baselines=None,
588                    target=target,
589                    batch_idx=batch_idx,
590                    num_batches=num_batches,
591                )
592            elif self.importance_type == "gradientshap_abs":
593                importance_step = self.get_importance_step_layer_gradientshap_abs(
594                    layer_name=layer_name,
595                    input=input,
596                    baselines=None,
597                    target=target,
598                    batch_idx=batch_idx,
599                    num_batches=num_batches,
600                )
601            elif self.importance_type == "integrated_gradients_abs":
602                importance_step = (
603                    self.get_importance_step_layer_integrated_gradients_abs(
604                        layer_name=layer_name,
605                        input=input,
606                        baselines=None,
607                        target=target,
608                        batch_idx=batch_idx,
609                        num_batches=num_batches,
610                    )
611                )
612            elif self.importance_type == "feature_ablation_abs":
613                importance_step = self.get_importance_step_layer_feature_ablation_abs(
614                    layer_name=layer_name,
615                    input=input,
616                    layer_baselines=None,
617                    target=target,
618                    batch_idx=batch_idx,
619                    num_batches=num_batches,
620                )
621            elif self.importance_type == "lrp_abs":
622                importance_step = self.get_importance_step_layer_lrp_abs(
623                    layer_name=layer_name,
624                    input=input,
625                    target=target,
626                    batch_idx=batch_idx,
627                    num_batches=num_batches,
628                )
629            elif self.importance_type == "cbp_adaptation":
630                importance_step = self.get_importance_step_layer_weight_abs_sum(
631                    layer_name=layer_name,
632                    if_output_weight=False,
633                    reciprocal=True,
634                )
635            elif self.importance_type == "cbp_adaptive_contribution":
636                importance_step = (
637                    self.get_importance_step_layer_cbp_adaptive_contribution(
638                        layer_name=layer_name,
639                        activation=activation,
640                    )
641                )
642
643            importance_step = min_max_normalize(
644                importance_step
645            )  # min-max scaling the utility to $[0, 1]$. See Eq. (5) in the paper
646
647            # multiply the importance by the training mask. See Eq. (6) in the paper
648            if self.step_multiply_training_mask:
649                importance_step = importance_step * mask[layer_name]
650
651            # update accumulated importance
652            self.importances[self.task_id][layer_name] = (
653                self.importances[self.task_id][layer_name] + importance_step
654            )
655
656        # update number of steps counter
657        self.num_steps_t += 1

Calculate the step-wise importance, update the accumulated importance and number of steps counter after each training step.

Args:

outputs (dict[str, Any]): outputs of the training step (returns of training_step() in CLAlgorithm).
batch (Any): training data batch.
batch_idx (int): index of the current batch (for mask figure file name).

def on_train_end(self) -> None: View Source

659    def on_train_end(self) -> None:
660        r"""Additionally calculate neuron-wise importance for previous tasks at the end of training each task."""
661        super().on_train_end()  # store the mask and update cumulative and summative masks
662
663        for layer_name in self.backbone.weighted_layer_names:
664
665            # average the neuron-wise step importance. See Eq. (4) in the paper
666            self.importances[self.task_id][layer_name] = (
667                self.importances[self.task_id][layer_name]
668            ) / self.num_steps_t
669
670            # add the base importance. See Eq. (6) in the paper
671            self.importances[self.task_id][layer_name] = (
672                self.importances[self.task_id][layer_name] + self.base_importance
673            )
674
675            # filter unmasked importance
676            if self.filter_unmasked_importance:
677                self.importances[self.task_id][layer_name] = (
678                    self.importances[self.task_id][layer_name]
679                    * self.backbone.masks[f"{self.task_id}"][layer_name]
680                )
681
682            # calculate the summative neuron-wise importance for previous tasks. See Eq. (4) in the paper
683            if self.importance_summing_strategy == "add_latest":
684                self.summative_importance_for_previous_tasks[
685                    layer_name
686                ] += self.importances[self.task_id][layer_name]
687
688            elif self.importance_summing_strategy == "add_all":
689                for t in range(1, self.task_id + 1):
690                    self.summative_importance_for_previous_tasks[
691                        layer_name
692                    ] += self.importances[t][layer_name]
693
694            elif self.importance_summing_strategy == "add_average":
695                for t in range(1, self.task_id + 1):
696                    self.summative_importance_for_previous_tasks[layer_name] += (
697                        self.importances[t][layer_name] / self.task_id
698                    )
699            else:
700                self.summative_importance_for_previous_tasks[
701                    layer_name
702                ] = torch.zeros_like(
703                    self.summative_importance_for_previous_tasks[layer_name]
704                ).to(
705                    self.device
706                )  # starting adding from 0
707
708                if self.importance_summing_strategy == "linear_decrease":
709                    s = self.importance_summing_strategy_linear_step
710                    for t in range(1, self.task_id + 1):
711                        w_t = s * (self.task_id - t) + 1
712
713                elif self.importance_summing_strategy == "quadratic_decrease":
714                    for t in range(1, self.task_id + 1):
715                        w_t = (self.task_id - t + 1) ** 2
716                elif self.importance_summing_strategy == "cubic_decrease":
717                    for t in range(1, self.task_id + 1):
718                        w_t = (self.task_id - t + 1) ** 3
719                elif self.importance_summing_strategy == "exponential_decrease":
720                    for t in range(1, self.task_id + 1):
721                        r = self.importance_summing_strategy_exponential_rate
722
723                        w_t = r ** (self.task_id - t + 1)
724                elif self.importance_summing_strategy == "log_decrease":
725                    a = self.importance_summing_strategy_log_base
726                    for t in range(1, self.task_id + 1):
727                        w_t = math.log(self.task_id - t, a) + 1
728                elif self.importance_summing_strategy == "factorial_decrease":
729                    for t in range(1, self.task_id + 1):
730                        w_t = math.factorial(self.task_id - t + 1)
731                else:
732                    raise ValueError
733                self.summative_importance_for_previous_tasks[layer_name] += (
734                    self.importances[t][layer_name] * w_t
735                )

Additionally calculate neuron-wise importance for previous tasks at the end of training each task.

def get_importance_step_layer_weight_abs_sum( self: str, layer_name: str, if_output_weight: bool, reciprocal: bool) -> torch.Tensor: View Source

737    def get_importance_step_layer_weight_abs_sum(
738        self: str,
739        layer_name: str,
740        if_output_weight: bool,
741        reciprocal: bool,
742    ) -> Tensor:
743        r"""Get the raw neuron-wise importance (before scaling) of a layer of a training step. See $I^{\tau}_l(\mathbf{x},y)$ (before Eqs. (5) and (6)) in the paper. This method uses the sum of absolute values of layer input or output weights.
744
745        **Args:**
746        - **layer_name** (`str`): the name of layer to get neuron-wise importance.
747        - **if_output_weight** (`bool`): whether to use the output weights or input weights.
748        - **reciprocal** (`bool`): whether to take reciprocal.
749
750        **Returns:**
751        - **importance_step_layer** (`Tensor`): the neuron-wise importance of the layer of the training step.
752        """
753        layer = self.backbone.get_layer_by_name(layer_name)
754
755        if not if_output_weight:
756            weight_abs = torch.abs(layer.weight.data)
757            weight_abs_sum = torch.sum(
758                weight_abs,
759                dim=[
760                    i for i in range(weight_abs.dim()) if i != 0
761                ],  # sum over the input dimension
762            )
763        else:
764            weight_abs = torch.abs(self.next_layer(layer_name).weight.data)
765            weight_abs_sum = torch.sum(
766                weight_abs,
767                dim=[
768                    i for i in range(weight_abs.dim()) if i != 1
769                ],  # sum over the output dimension
770            )
771
772        if reciprocal:
773            weight_abs_sum_reciprocal = torch.reciprocal(weight_abs_sum)
774            importance_step_layer = weight_abs_sum_reciprocal
775        else:
776            importance_step_layer = weight_abs_sum
777        importance_step_layer = importance_step_layer.detach()
778
779        return importance_step_layer

Get the raw neuron-wise importance (before scaling) of a layer of a training step. See $I^{\tau}_l(\mathbf{x},y)$ (before Eqs. (5) and (6)) in the paper. This method uses the sum of absolute values of layer input or output weights.

Args:

layer_name (str): the name of layer to get neuron-wise importance.
if_output_weight (bool): whether to use the output weights or input weights.
reciprocal (bool): whether to take reciprocal.

Returns:

importance_step_layer (Tensor): the neuron-wise importance of the layer of the training step.

def get_importance_step_layer_weight_gradient_abs_sum(self: str, layer_name: str, if_output_weight: bool) -> torch.Tensor: View Source

781    def get_importance_step_layer_weight_gradient_abs_sum(
782        self: str,
783        layer_name: str,
784        if_output_weight: bool,
785    ) -> Tensor:
786        r"""Get the raw neuron-wise importance (before scaling) of a layer of a training step. See $I^{\tau}_l(\mathbf{x},y)$ (before Eqs. (5) and (6)) in the paper. This method uses the sum of absolute values of gradients of the layer input or output weights.
787
788        **Args:**
789        - **layer_name** (`str`): the name of layer to get neuron-wise importance.
790        - **if_output_weight** (`bool`): whether to use the output weights or input weights.
791
792        **Returns:**
793        - **importance_step_layer** (`Tensor`): the neuron-wise importance of the layer of the training step.
794        """
795        layer = self.backbone.get_layer_by_name(layer_name)
796
797        if not if_output_weight:
798            gradient_abs = torch.abs(layer.weight.grad.data)
799            gradient_abs_sum = torch.sum(
800                gradient_abs,
801                dim=[
802                    i for i in range(gradient_abs.dim()) if i != 0
803                ],  # sum over the input dimension
804            )
805        else:
806            gradient_abs = torch.abs(self.next_layer(layer_name).weight.grad.data)
807            gradient_abs_sum = torch.sum(
808                gradient_abs,
809                dim=[
810                    i for i in range(gradient_abs.dim()) if i != 1
811                ],  # sum over the output dimension
812            )
813
814        importance_step_layer = gradient_abs_sum
815        importance_step_layer = importance_step_layer.detach()
816
817        return importance_step_layer

Get the raw neuron-wise importance (before scaling) of a layer of a training step. See $I^{\tau}_l(\mathbf{x},y)$ (before Eqs. (5) and (6)) in the paper. This method uses the sum of absolute values of gradients of the layer input or output weights.

Args:

layer_name (str): the name of layer to get neuron-wise importance.
if_output_weight (bool): whether to use the output weights or input weights.

Returns:

importance_step_layer (Tensor): the neuron-wise importance of the layer of the training step.

def get_importance_step_layer_activation_abs(self: str, activation: torch.Tensor) -> torch.Tensor: View Source

819    def get_importance_step_layer_activation_abs(
820        self: str,
821        activation: Tensor,
822    ) -> Tensor:
823        r"""Get the raw neuron-wise importance (before scaling) of a layer of a training step. See $I^{\tau}_l(\mathbf{x},y)$ (before Eqs. (5) and (6)) in the paper. This method uses the absolute value of activation of the layer. This is our own implementation of [Layer Activation](https://captum.ai/api/layer.html#layer-activation) in Captum.
824
825        **Args:**
826        - **activation** (`Tensor`): the activation tensor of the layer. It has the same size of (number of units, ).
827
828        **Returns:**
829        - **importance_step_layer** (`Tensor`): the neuron-wise importance of the layer of the training step.
830        """
831        activation_abs_batch_mean = torch.mean(
832            torch.abs(activation),
833            dim=[
834                i for i in range(activation.dim()) if i != 1
835            ],  # average the features over batch samples
836        )
837        importance_step_layer = activation_abs_batch_mean
838        importance_step_layer = importance_step_layer.detach()
839
840        return importance_step_layer

Get the raw neuron-wise importance (before scaling) of a layer of a training step. See $I^{\tau}_l(\mathbf{x},y)$ (before Eqs. (5) and (6)) in the paper. This method uses the absolute value of activation of the layer. This is our own implementation of Layer Activation in Captum.

Args:

activation (Tensor): the activation tensor of the layer. It has the same size of (number of units, ).

Returns:

importance_step_layer (Tensor): the neuron-wise importance of the layer of the training step.

def get_importance_step_layer_weight_abs_sum_x_activation_abs( self: str, layer_name: str, activation: torch.Tensor, if_output_weight: bool) -> torch.Tensor: View Source

842    def get_importance_step_layer_weight_abs_sum_x_activation_abs(
843        self: str,
844        layer_name: str,
845        activation: Tensor,
846        if_output_weight: bool,
847    ) -> Tensor:
848        r"""Get the raw neuron-wise importance (before scaling) of a layer of a training step. See $I^{\tau}_l(\mathbf{x},y)$ (before Eqs. (5) and (6)) in the paper. This method uses the sum of absolute values of layer input / output weights multiplied by absolute values of activation. The input weights version is equal to the contribution utility in [CBP](https://www.nature.com/articles/s41586-024-07711-7).
849
850        **Args:**
851        - **layer_name** (`str`): the name of layer to get neuron-wise importance.
852        - **activation** (`Tensor`): the activation tensor of the layer. It has the same size of (number of units, ).
853        - **if_output_weight** (`bool`): whether to use the output weights or input weights.
854
855        **Returns:**
856        - **importance_step_layer** (`Tensor`): the neuron-wise importance of the layer of the training step.
857        """
858        layer = self.backbone.get_layer_by_name(layer_name)
859
860        if not if_output_weight:
861            weight_abs = torch.abs(layer.weight.data)
862            weight_abs_sum = torch.sum(
863                weight_abs,
864                dim=[
865                    i for i in range(weight_abs.dim()) if i != 0
866                ],  # sum over the input dimension
867            )
868        else:
869            weight_abs = torch.abs(self.next_layer(layer_name).weight.data)
870            weight_abs_sum = torch.sum(
871                weight_abs,
872                dim=[
873                    i for i in range(weight_abs.dim()) if i != 1
874                ],  # sum over the output dimension
875            )
876
877        activation_abs_batch_mean = torch.mean(
878            torch.abs(activation),
879            dim=[
880                i for i in range(activation.dim()) if i != 1
881            ],  # average the features over batch samples
882        )
883
884        importance_step_layer = weight_abs_sum * activation_abs_batch_mean
885        importance_step_layer = importance_step_layer.detach()
886
887        return importance_step_layer

Get the raw neuron-wise importance (before scaling) of a layer of a training step. See $I^{\tau}_l(\mathbf{x},y)$ (before Eqs. (5) and (6)) in the paper. This method uses the sum of absolute values of layer input / output weights multiplied by absolute values of activation. The input weights version is equal to the contribution utility in CBP.

Args:

layer_name (str): the name of layer to get neuron-wise importance.
activation (Tensor): the activation tensor of the layer. It has the same size of (number of units, ).
if_output_weight (bool): whether to use the output weights or input weights.

Returns:

importance_step_layer (Tensor): the neuron-wise importance of the layer of the training step.

def get_importance_step_layer_gradient_x_activation_abs( self: str, layer_name: str, input: torch.Tensor | tuple[torch.Tensor, ...], target: torch.Tensor | None, batch_idx: int, num_batches: int) -> torch.Tensor: View Source

889    def get_importance_step_layer_gradient_x_activation_abs(
890        self: str,
891        layer_name: str,
892        input: Tensor | tuple[Tensor, ...],
893        target: Tensor | None,
894        batch_idx: int,
895        num_batches: int,
896    ) -> Tensor:
897        r"""Get the raw neuron-wise importance (before scaling) of a layer of a training step. See $I^{\tau}_l(\mathbf{x},y)$ (before Eqs. (5) and (6)) in the paper. This method uses the absolute values of the gradient of layer activation multiplied by the activation. We implement this using [Layer Gradient X Activation](https://captum.ai/api/layer.html#layer-gradient-x-activation) in Captum.
898
899        **Args:**
900        - **layer_name** (`str`): the name of layer to get neuron-wise importance.
901        - **input** (`Tensor` | `tuple[Tensor, ...]`): the input batch of the training step.
902        - **target** (`Tensor` | `None`): the target batch of the training step.
903        - **batch_idx** (`int`): the index of the current batch. This is an argument of the forward function during training.
904        - **num_batches** (`int`): the number of batches in the training step. This is an argument of the forward function during training.
905
906        **Returns:**
907        - **importance_step_layer** (`Tensor`): the neuron-wise importance of the layer of the training step.
908        """
909        layer = self.backbone.get_layer_by_name(layer_name)
910
911        input = input.requires_grad_()
912
913        # initialize the Layer Gradient X Activation object
914        layer_gradient_x_activation = LayerGradientXActivation(
915            forward_func=self.forward, layer=layer
916        )
917
918        self.set_forward_func_return_logits_only(True)
919        # calculate layer attribution of the step
920        attribution = layer_gradient_x_activation.attribute(
921            inputs=input,
922            target=target,
923            additional_forward_args=("train", batch_idx, num_batches, self.task_id),
924        )
925        self.set_forward_func_return_logits_only(False)
926
927        attribution_abs_batch_mean = torch.mean(
928            torch.abs(attribution),
929            dim=[
930                i for i in range(attribution.dim()) if i != 1
931            ],  # average the features over batch samples
932        )
933
934        importance_step_layer = attribution_abs_batch_mean
935        importance_step_layer = importance_step_layer.detach()
936
937        return importance_step_layer

Get the raw neuron-wise importance (before scaling) of a layer of a training step. See $I^{\tau}_l(\mathbf{x},y)$ (before Eqs. (5) and (6)) in the paper. This method uses the absolute values of the gradient of layer activation multiplied by the activation. We implement this using Layer Gradient X Activation in Captum.

Args:

layer_name (str): the name of layer to get neuron-wise importance.
input (Tensor | tuple[Tensor, ...]): the input batch of the training step.
target (Tensor | None): the target batch of the training step.
batch_idx (int): the index of the current batch. This is an argument of the forward function during training.
num_batches (int): the number of batches in the training step. This is an argument of the forward function during training.

Returns:

importance_step_layer (Tensor): the neuron-wise importance of the layer of the training step.

def get_importance_step_layer_weight_gradient_square_sum( self: str, layer_name: str, activation: torch.Tensor, if_output_weight: bool) -> torch.Tensor: View Source

939    def get_importance_step_layer_weight_gradient_square_sum(
940        self: str,
941        layer_name: str,
942        activation: Tensor,
943        if_output_weight: bool,
944    ) -> Tensor:
945        r"""Get the raw neuron-wise importance (before scaling) of a layer of a training step. See $I^{\tau}_l(\mathbf{x},y)$ (before Eqs. (5) and (6)) in the paper. This method uses the sum of layer weight gradient squares. The weight gradient square is equal to fisher information in [EWC](https://www.pnas.org/doi/10.1073/pnas.1611835114).
946
947        **Args:**
948        - **layer_name** (`str`): the name of layer to get neuron-wise importance.
949        - **activation** (`Tensor`): the activation tensor of the layer. It has the same size of (number of units, ).
950        - **if_output_weight** (`bool`): whether to use the output weights or input weights.
951
952        **Returns:**
953        - **importance_step_layer** (`Tensor`): the neuron-wise importance of the layer of the training step.
954        """
955        layer = self.backbone.get_layer_by_name(layer_name)
956
957        if not if_output_weight:
958            gradient_square = layer.weight.grad.data**2
959            gradient_square_sum = torch.sum(
960                gradient_square,
961                dim=[
962                    i for i in range(gradient_square.dim()) if i != 0
963                ],  # sum over the input dimension
964            )
965        else:
966            gradient_square = self.next_layer(layer_name).weight.grad.data**2
967            gradient_square_sum = torch.sum(
968                gradient_square,
969                dim=[
970                    i for i in range(gradient_square.dim()) if i != 1
971                ],  # sum over the output dimension
972            )
973
974        importance_step_layer = gradient_square_sum
975        importance_step_layer = importance_step_layer.detach()
976
977        return importance_step_layer

Get the raw neuron-wise importance (before scaling) of a layer of a training step. See $I^{\tau}_l(\mathbf{x},y)$ (before Eqs. (5) and (6)) in the paper. This method uses the sum of layer weight gradient squares. The weight gradient square is equal to fisher information in EWC.

Args:

layer_name (str): the name of layer to get neuron-wise importance.
activation (Tensor): the activation tensor of the layer. It has the same size of (number of units, ).
if_output_weight (bool): whether to use the output weights or input weights.

Returns:

importance_step_layer (Tensor): the neuron-wise importance of the layer of the training step.

def get_importance_step_layer_weight_gradient_square_sum_x_activation_abs( self: str, layer_name: str, activation: torch.Tensor, if_output_weight: bool) -> torch.Tensor: View Source

 979    def get_importance_step_layer_weight_gradient_square_sum_x_activation_abs(
 980        self: str,
 981        layer_name: str,
 982        activation: Tensor,
 983        if_output_weight: bool,
 984    ) -> Tensor:
 985        r"""Get the raw neuron-wise importance (before scaling) of a layer of a training step. See $I^{\tau}_l(\mathbf{x},y)$ (before Eqs. (5) and (6)) in the paper. This method uses the sum of layer weight gradient squares multiplied by absolute values of activation. The weight gradient square is equal to fisher information in [EWC](https://www.pnas.org/doi/10.1073/pnas.1611835114).
 986
 987        **Args:**
 988        - **layer_name** (`str`): the name of layer to get neuron-wise importance.
 989        - **activation** (`Tensor`): the activation tensor of the layer. It has the same size of (number of units, ).
 990        - **if_output_weight** (`bool`): whether to use the output weights or input weights.
 991
 992        **Returns:**
 993        - **importance_step_layer** (`Tensor`): the neuron-wise importance of the layer of the training step.
 994        """
 995        layer = self.backbone.get_layer_by_name(layer_name)
 996
 997        if not if_output_weight:
 998            gradient_square = layer.weight.grad.data**2
 999            gradient_square_sum = torch.sum(
1000                gradient_square,
1001                dim=[
1002                    i for i in range(gradient_square.dim()) if i != 0
1003                ],  # sum over the input dimension
1004            )
1005        else:
1006            gradient_square = self.next_layer(layer_name).weight.grad.data**2
1007            gradient_square_sum = torch.sum(
1008                gradient_square,
1009                dim=[
1010                    i for i in range(gradient_square.dim()) if i != 1
1011                ],  # sum over the output dimension
1012            )
1013
1014        activation_abs_batch_mean = torch.mean(
1015            torch.abs(activation),
1016            dim=[
1017                i for i in range(activation.dim()) if i != 1
1018            ],  # average the features over batch samples
1019        )
1020
1021        importance_step_layer = gradient_square_sum * activation_abs_batch_mean
1022        importance_step_layer = importance_step_layer.detach()
1023
1024        return importance_step_layer

Get the raw neuron-wise importance (before scaling) of a layer of a training step. See $I^{\tau}_l(\mathbf{x},y)$ (before Eqs. (5) and (6)) in the paper. This method uses the sum of layer weight gradient squares multiplied by absolute values of activation. The weight gradient square is equal to fisher information in EWC.

Args:

layer_name (str): the name of layer to get neuron-wise importance.
activation (Tensor): the activation tensor of the layer. It has the same size of (number of units, ).
if_output_weight (bool): whether to use the output weights or input weights.

Returns:

importance_step_layer (Tensor): the neuron-wise importance of the layer of the training step.

1026    def get_importance_step_layer_conductance_abs(
1027        self: str,
1028        layer_name: str,
1029        input: Tensor | tuple[Tensor, ...],
1030        baselines: None | int | float | Tensor | tuple[int | float | Tensor, ...],
1031        target: Tensor | None,
1032        batch_idx: int,
1033        num_batches: int,
1034    ) -> Tensor:
1035        r"""Get the raw neuron-wise importance (before scaling) of a layer of a training step. See $I^{\tau}_l(\mathbf{x},y)$ (before Eqs. (5) and (6)) in the paper. This method uses the absolute values of [conductance](https://openreview.net/forum?id=SylKoo0cKm). We implement this using [Layer Conductance](https://captum.ai/api/layer.html#layer-conductance) in Captum.
1036
1037        **Args:**
1038        - **layer_name** (`str`): the name of layer to get neuron-wise importance.
1039        - **input** (`Tensor` | `tuple[Tensor, ...]`): the input batch of the training step.
1040        - **baselines** (`None` | `int` | `float` | `Tensor` | `tuple[int | float | Tensor, ...]`): starting point from which integral is computed in this method. Please refer to the [Captum documentation](https://captum.ai/api/layer.html#captum.attr.LayerConductance.attribute) for more details.
1041        - **target** (`Tensor` | `None`): the target batch of the training step.
1042        - **batch_idx** (`int`): the index of the current batch. This is an argument of the forward function during training.
1043        - **num_batches** (`int`): the number of batches in the training step. This is an argument of the forward function during training.- **mask** (`Tensor`): the mask tensor of the layer. It has the same size as the feature tensor with size (number of units, ).
1044
1045        **Returns:**
1046        - **importance_step_layer** (`Tensor`): the neuron-wise importance of the layer of the training step.
1047        """
1048        layer = self.backbone.get_layer_by_name(layer_name)
1049
1050        # initialize the Layer Conductance object
1051        layer_conductance = LayerConductance(forward_func=self.forward, layer=layer)
1052
1053        self.set_forward_func_return_logits_only(True)
1054        # calculate layer attribution of the step
1055        attribution = layer_conductance.attribute(
1056            inputs=input,
1057            baselines=baselines,
1058            target=target,
1059            additional_forward_args=("train", batch_idx, num_batches, self.task_id),
1060        )
1061        self.set_forward_func_return_logits_only(False)
1062
1063        attribution_abs_batch_mean = torch.mean(
1064            torch.abs(attribution),
1065            dim=[
1066                i for i in range(attribution.dim()) if i != 1
1067            ],  # average the features over batch samples
1068        )
1069
1070        importance_step_layer = attribution_abs_batch_mean
1071        importance_step_layer = importance_step_layer.detach()
1072
1073        return importance_step_layer

Get the raw neuron-wise importance (before scaling) of a layer of a training step. See $I^{\tau}_l(\mathbf{x},y)$ (before Eqs. (5) and (6)) in the paper. This method uses the absolute values of conductance. We implement this using Layer Conductance in Captum.

Args:

layer_name (str): the name of layer to get neuron-wise importance.
input (Tensor | tuple[Tensor, ...]): the input batch of the training step.
baselines (None | int | float | Tensor | tuple[int | float | Tensor, ...]): starting point from which integral is computed in this method. Please refer to the Captum documentation for more details.
target (Tensor | None): the target batch of the training step.
batch_idx (int): the index of the current batch. This is an argument of the forward function during training.
num_batches (int): the number of batches in the training step. This is an argument of the forward function during training.- mask (Tensor): the mask tensor of the layer. It has the same size as the feature tensor with size (number of units, ).

Returns:

importance_step_layer (Tensor): the neuron-wise importance of the layer of the training step.

1075    def get_importance_step_layer_internal_influence_abs(
1076        self: str,
1077        layer_name: str,
1078        input: Tensor | tuple[Tensor, ...],
1079        baselines: None | int | float | Tensor | tuple[int | float | Tensor, ...],
1080        target: Tensor | None,
1081        batch_idx: int,
1082        num_batches: int,
1083    ) -> Tensor:
1084        r"""Get the raw neuron-wise importance (before scaling) of a layer of a training step. See $I^{\tau}_l(\mathbf{x},y)$ (before Eqs. (5) and (6)) in the paper. This method uses the absolute values of [internal influence](https://openreview.net/forum?id=SJPpHzW0-). We implement this using [Internal Influence](https://captum.ai/api/layer.html#internal-influence) in Captum.
1085
1086        **Args:**
1087        - **layer_name** (`str`): the name of layer to get neuron-wise importance.
1088        - **input** (`Tensor` | `tuple[Tensor, ...]`): the input batch of the training step.
1089        - **baselines** (`None` | `int` | `float` | `Tensor` | `tuple[int | float | Tensor, ...]`): starting point from which integral is computed in this method. Please refer to the [Captum documentation](https://captum.ai/api/layer.html#captum.attr.InternalInfluence.attribute) for more details.
1090        - **target** (`Tensor` | `None`): the target batch of the training step.
1091        - **batch_idx** (`int`): the index of the current batch. This is an argument of the forward function during training.
1092        - **num_batches** (`int`): the number of batches in the training step. This is an argument of the forward function during training.
1093
1094        **Returns:**
1095        - **importance_step_layer** (`Tensor`): the neuron-wise importance of the layer of the training step.
1096        """
1097        layer = self.backbone.get_layer_by_name(layer_name)
1098
1099        # initialize the Internal Influence object
1100        internal_influence = InternalInfluence(forward_func=self.forward, layer=layer)
1101
1102        # convert the target to long type to avoid error
1103        target = target.long() if target is not None else None
1104
1105        self.set_forward_func_return_logits_only(True)
1106        # calculate layer attribution of the step
1107        attribution = internal_influence.attribute(
1108            inputs=input,
1109            baselines=baselines,
1110            target=target,
1111            additional_forward_args=("train", batch_idx, num_batches, self.task_id),
1112            n_steps=5,  # set 10 instead of default 50 to accelerate the computation
1113        )
1114        self.set_forward_func_return_logits_only(False)
1115
1116        attribution_abs_batch_mean = torch.mean(
1117            torch.abs(attribution),
1118            dim=[
1119                i for i in range(attribution.dim()) if i != 1
1120            ],  # average the features over batch samples
1121        )
1122
1123        importance_step_layer = attribution_abs_batch_mean
1124        importance_step_layer = importance_step_layer.detach()
1125
1126        return importance_step_layer

Get the raw neuron-wise importance (before scaling) of a layer of a training step. See $I^{\tau}_l(\mathbf{x},y)$ (before Eqs. (5) and (6)) in the paper. This method uses the absolute values of internal influence. We implement this using Internal Influence in Captum.

Args:

layer_name (str): the name of layer to get neuron-wise importance.
input (Tensor | tuple[Tensor, ...]): the input batch of the training step.
baselines (None | int | float | Tensor | tuple[int | float | Tensor, ...]): starting point from which integral is computed in this method. Please refer to the Captum documentation for more details.
target (Tensor | None): the target batch of the training step.
batch_idx (int): the index of the current batch. This is an argument of the forward function during training.
num_batches (int): the number of batches in the training step. This is an argument of the forward function during training.

Returns:

importance_step_layer (Tensor): the neuron-wise importance of the layer of the training step.

def get_importance_step_layer_gradcam_abs( self: str, layer_name: str, input: torch.Tensor | tuple[torch.Tensor, ...], target: torch.Tensor | None, batch_idx: int, num_batches: int) -> torch.Tensor: View Source

1128    def get_importance_step_layer_gradcam_abs(
1129        self: str,
1130        layer_name: str,
1131        input: Tensor | tuple[Tensor, ...],
1132        target: Tensor | None,
1133        batch_idx: int,
1134        num_batches: int,
1135    ) -> Tensor:
1136        r"""Get the raw neuron-wise importance (before scaling) of a layer of a training step. See $I^{\tau}_l(\mathbf{x},y)$ (before Eqs. (5) and (6)) in the paper. This method uses the absolute values of [Grad-CAM](https://openreview.net/forum?id=SJPpHzW0-). We implement this using [Layer Grad-CAM](https://captum.ai/api/layer.html#gradcam) in Captum.
1137
1138        **Args:**
1139        - **layer_name** (`str`): the name of layer to get neuron-wise importance.
1140        - **input** (`Tensor` | `tuple[Tensor, ...]`): the input batch of the training step.
1141        - **target** (`Tensor` | `None`): the target batch of the training step.
1142        - **batch_idx** (`int`): the index of the current batch. This is an argument of the forward function during training.
1143        - **num_batches** (`int`): the number of batches in the training step. This is an argument of the forward function during training.
1144
1145        **Returns:**
1146        - **importance_step_layer** (`Tensor`): the neuron-wise importance of the layer of the training step.
1147        """
1148        layer = self.backbone.get_layer_by_name(layer_name)
1149
1150        # initialize the GradCAM object
1151        gradcam = LayerGradCam(forward_func=self.forward, layer=layer)
1152
1153        self.set_forward_func_return_logits_only(True)
1154        # calculate layer attribution of the step
1155        attribution = gradcam.attribute(
1156            inputs=input,
1157            target=target,
1158            additional_forward_args=("train", batch_idx, num_batches, self.task_id),
1159        )
1160        self.set_forward_func_return_logits_only(False)
1161
1162        attribution_abs_batch_mean = torch.mean(
1163            torch.abs(attribution),
1164            dim=[
1165                i for i in range(attribution.dim()) if i != 1
1166            ],  # average the features over batch samples
1167        )
1168
1169        importance_step_layer = attribution_abs_batch_mean
1170        importance_step_layer = importance_step_layer.detach()
1171
1172        return importance_step_layer

Get the raw neuron-wise importance (before scaling) of a layer of a training step. See $I^{\tau}_l(\mathbf{x},y)$ (before Eqs. (5) and (6)) in the paper. This method uses the absolute values of Grad-CAM. We implement this using Layer Grad-CAM in Captum.

Args:

layer_name (str): the name of layer to get neuron-wise importance.
input (Tensor | tuple[Tensor, ...]): the input batch of the training step.
target (Tensor | None): the target batch of the training step.
batch_idx (int): the index of the current batch. This is an argument of the forward function during training.
num_batches (int): the number of batches in the training step. This is an argument of the forward function during training.

Returns:

importance_step_layer (Tensor): the neuron-wise importance of the layer of the training step.

1174    def get_importance_step_layer_deeplift_abs(
1175        self: str,
1176        layer_name: str,
1177        input: Tensor | tuple[Tensor, ...],
1178        baselines: None | int | float | Tensor | tuple[int | float | Tensor, ...],
1179        target: Tensor | None,
1180        batch_idx: int,
1181        num_batches: int,
1182    ) -> Tensor:
1183        r"""Get the raw neuron-wise importance (before scaling) of a layer of a training step. See $I^{\tau}_l(\mathbf{x},y)$ (before Eqs. (5) and (6)) in the paper. This method uses the absolute values of [DeepLift](https://proceedings.mlr.press/v70/shrikumar17a/shrikumar17a.pdf). We implement this using [Layer DeepLift](https://captum.ai/api/layer.html#layer-deeplift) in Captum.
1184
1185        **Args:**
1186        - **layer_name** (`str`): the name of layer to get neuron-wise importance.
1187        - **input** (`Tensor` | `tuple[Tensor, ...]`): the input batch of the training step.
1188        - **baselines** (`None` | `int` | `float` | `Tensor` | `tuple[int | float | Tensor, ...]`): baselines define reference samples that are compared with the inputs. Please refer to the [Captum documentation](https://captum.ai/api/layer.html#captum.attr.LayerDeepLift.attribute) for more details.
1189        - **target** (`Tensor` | `None`): the target batch of the training step.
1190        - **batch_idx** (`int`): the index of the current batch. This is an argument of the forward function during training.
1191        - **num_batches** (`int`): the number of batches in the training step. This is an argument of the forward function during training.
1192
1193        **Returns:**
1194        - **importance_step_layer** (`Tensor`): the neuron-wise importance of the layer of the training step.
1195        """
1196        layer = self.backbone.get_layer_by_name(layer_name)
1197
1198        # initialize the Layer DeepLift object
1199        layer_deeplift = LayerDeepLift(model=self, layer=layer)
1200
1201        # convert the target to long type to avoid error
1202        target = target.long() if target is not None else None
1203
1204        self.set_forward_func_return_logits_only(True)
1205        # calculate layer attribution of the step
1206        attribution = layer_deeplift.attribute(
1207            inputs=input,
1208            baselines=baselines,
1209            target=target,
1210            additional_forward_args=("train", batch_idx, num_batches, self.task_id),
1211        )
1212        self.set_forward_func_return_logits_only(False)
1213
1214        attribution_abs_batch_mean = torch.mean(
1215            torch.abs(attribution),
1216            dim=[
1217                i for i in range(attribution.dim()) if i != 1
1218            ],  # average the features over batch samples
1219        )
1220
1221        importance_step_layer = attribution_abs_batch_mean
1222        importance_step_layer = importance_step_layer.detach()
1223
1224        return importance_step_layer

Get the raw neuron-wise importance (before scaling) of a layer of a training step. See $I^{\tau}_l(\mathbf{x},y)$ (before Eqs. (5) and (6)) in the paper. This method uses the absolute values of DeepLift. We implement this using Layer DeepLift in Captum.

Args:

layer_name (str): the name of layer to get neuron-wise importance.
input (Tensor | tuple[Tensor, ...]): the input batch of the training step.
baselines (None | int | float | Tensor | tuple[int | float | Tensor, ...]): baselines define reference samples that are compared with the inputs. Please refer to the Captum documentation for more details.
target (Tensor | None): the target batch of the training step.
batch_idx (int): the index of the current batch. This is an argument of the forward function during training.
num_batches (int): the number of batches in the training step. This is an argument of the forward function during training.

Returns:

importance_step_layer (Tensor): the neuron-wise importance of the layer of the training step.

1226    def get_importance_step_layer_deepliftshap_abs(
1227        self: str,
1228        layer_name: str,
1229        input: Tensor | tuple[Tensor, ...],
1230        baselines: None | int | float | Tensor | tuple[int | float | Tensor, ...],
1231        target: Tensor | None,
1232        batch_idx: int,
1233        num_batches: int,
1234    ) -> Tensor:
1235        r"""Get the raw neuron-wise importance (before scaling) of a layer of a training step. See $I^{\tau}_l(\mathbf{x},y)$ (before Eqs. (5) and (6)) in the paper. This method uses the absolute values of [DeepLift SHAP](https://proceedings.neurips.cc/paper_files/paper/2017/file/8a20a8621978632d76c43dfd28b67767-Paper.pdf). We implement this using [Layer DeepLiftShap](https://captum.ai/api/layer.html#layer-deepliftshap) in Captum.
1236
1237        **Args:**
1238        - **layer_name** (`str`): the name of layer to get neuron-wise importance.
1239        - **input** (`Tensor` | `tuple[Tensor, ...]`): the input batch of the training step.
1240        - **baselines** (`None` | `int` | `float` | `Tensor` | `tuple[int | float | Tensor, ...]`): baselines define reference samples that are compared with the inputs. Please refer to the [Captum documentation](https://captum.ai/api/layer.html#captum.attr.LayerDeepLiftShap.attribute) for more details.
1241        - **target** (`Tensor` | `None`): the target batch of the training step.
1242        - **batch_idx** (`int`): the index of the current batch. This is an argument of the forward function during training.
1243        - **num_batches** (`int`): the number of batches in the training step. This is an argument of the forward function during training.
1244
1245        **Returns:**
1246        - **importance_step_layer** (`Tensor`): the neuron-wise importance of the layer of the training step.
1247        """
1248        layer = self.backbone.get_layer_by_name(layer_name)
1249
1250        # initialize the Layer DeepLiftShap object
1251        layer_deepliftshap = LayerDeepLiftShap(model=self, layer=layer)
1252
1253        # convert the target to long type to avoid error
1254        target = target.long() if target is not None else None
1255
1256        self.set_forward_func_return_logits_only(True)
1257        # calculate layer attribution of the step
1258        attribution = layer_deepliftshap.attribute(
1259            inputs=input,
1260            baselines=baselines,
1261            target=target,
1262            additional_forward_args=("train", batch_idx, num_batches, self.task_id),
1263        )
1264        self.set_forward_func_return_logits_only(False)
1265
1266        attribution_abs_batch_mean = torch.mean(
1267            torch.abs(attribution),
1268            dim=[
1269                i for i in range(attribution.dim()) if i != 1
1270            ],  # average the features over batch samples
1271        )
1272
1273        importance_step_layer = attribution_abs_batch_mean
1274        importance_step_layer = importance_step_layer.detach()
1275
1276        return importance_step_layer

Get the raw neuron-wise importance (before scaling) of a layer of a training step. See $I^{\tau}_l(\mathbf{x},y)$ (before Eqs. (5) and (6)) in the paper. This method uses the absolute values of DeepLift SHAP. We implement this using Layer DeepLiftShap in Captum.

Args:

layer_name (str): the name of layer to get neuron-wise importance.
input (Tensor | tuple[Tensor, ...]): the input batch of the training step.
baselines (None | int | float | Tensor | tuple[int | float | Tensor, ...]): baselines define reference samples that are compared with the inputs. Please refer to the Captum documentation for more details.
target (Tensor | None): the target batch of the training step.
batch_idx (int): the index of the current batch. This is an argument of the forward function during training.
num_batches (int): the number of batches in the training step. This is an argument of the forward function during training.

Returns:

importance_step_layer (Tensor): the neuron-wise importance of the layer of the training step.

1278    def get_importance_step_layer_gradientshap_abs(
1279        self: str,
1280        layer_name: str,
1281        input: Tensor | tuple[Tensor, ...],
1282        baselines: None | int | float | Tensor | tuple[int | float | Tensor, ...],
1283        target: Tensor | None,
1284        batch_idx: int,
1285        num_batches: int,
1286    ) -> Tensor:
1287        r"""Get the raw neuron-wise importance (before scaling) of a layer of a training step. See $I^{\tau}_l(\mathbf{x},y)$ (before Eqs. (5) and (6)) in the paper. This method uses the absolute values of gradient SHAP. We implement this using [Layer GradientShap](https://captum.ai/api/layer.html#layer-gradientshap) in Captum.
1288
1289        **Args:**
1290        - **layer_name** (`str`): the name of layer to get neuron-wise importance.
1291        - **input** (`Tensor` | `tuple[Tensor, ...]`): the input batch of the training step.
1292        - **baselines** (`None` | `int` | `float` | `Tensor` | `tuple[int | float | Tensor, ...]`): starting point from which expectation is computed. Please refer to the [Captum documentation](https://captum.ai/api/layer.html#captum.attr.LayerGradientShap.attribute) for more details. If `None`, the baselines are set to zero.
1293        - **target** (`Tensor` | `None`): the target batch of the training step.
1294        - **batch_idx** (`int`): the index of the current batch. This is an argument of the forward function during training.
1295        - **num_batches** (`int`): the number of batches in the training step. This is an argument of the forward function during training.
1296
1297        **Returns:**
1298        - **importance_step_layer** (`Tensor`): the neuron-wise importance of the layer of the training step.
1299        """
1300        layer = self.backbone.get_layer_by_name(layer_name)
1301
1302        if baselines is None:
1303            baselines = torch.zeros_like(
1304                input
1305            )  # baselines are mandatory for GradientShap API. We explicitly set them to zero
1306
1307        # initialize the Layer GradientShap object
1308        layer_gradientshap = LayerGradientShap(forward_func=self.forward, layer=layer)
1309
1310        # convert the target to long type to avoid error
1311        target = target.long() if target is not None else None
1312
1313        self.set_forward_func_return_logits_only(True)
1314        # calculate layer attribution of the step
1315        attribution = layer_gradientshap.attribute(
1316            inputs=input,
1317            baselines=baselines,
1318            target=target,
1319            additional_forward_args=("train", batch_idx, num_batches, self.task_id),
1320        )
1321        self.set_forward_func_return_logits_only(False)
1322
1323        attribution_abs_batch_mean = torch.mean(
1324            torch.abs(attribution),
1325            dim=[
1326                i for i in range(attribution.dim()) if i != 1
1327            ],  # average the features over batch samples
1328        )
1329
1330        importance_step_layer = attribution_abs_batch_mean
1331        importance_step_layer = importance_step_layer.detach()
1332
1333        return importance_step_layer

Get the raw neuron-wise importance (before scaling) of a layer of a training step. See $I^{\tau}_l(\mathbf{x},y)$ (before Eqs. (5) and (6)) in the paper. This method uses the absolute values of gradient SHAP. We implement this using Layer GradientShap in Captum.

Args:

layer_name (str): the name of layer to get neuron-wise importance.
input (Tensor | tuple[Tensor, ...]): the input batch of the training step.
baselines (None | int | float | Tensor | tuple[int | float | Tensor, ...]): starting point from which expectation is computed. Please refer to the Captum documentation for more details. If None, the baselines are set to zero.
target (Tensor | None): the target batch of the training step.
batch_idx (int): the index of the current batch. This is an argument of the forward function during training.
num_batches (int): the number of batches in the training step. This is an argument of the forward function during training.

Returns:

importance_step_layer (Tensor): the neuron-wise importance of the layer of the training step.

1335    def get_importance_step_layer_integrated_gradients_abs(
1336        self: str,
1337        layer_name: str,
1338        input: Tensor | tuple[Tensor, ...],
1339        baselines: None | int | float | Tensor | tuple[int | float | Tensor, ...],
1340        target: Tensor | None,
1341        batch_idx: int,
1342        num_batches: int,
1343    ) -> Tensor:
1344        r"""Get the raw neuron-wise importance (before scaling) of a layer of a training step. See $I^{\tau}_l(\mathbf{x},y)$ (before Eqs. (5) and (6)) in the paper. This method uses the absolute values of [integrated gradients](https://proceedings.mlr.press/v70/sundararajan17a/sundararajan17a.pdf). We implement this using [Layer Integrated Gradients](https://captum.ai/api/layer.html#layer-integrated-gradients) in Captum.
1345
1346        **Args:**
1347        - **layer_name** (`str`): the name of layer to get neuron-wise importance.
1348        - **input** (`Tensor` | `tuple[Tensor, ...]`): the input batch of the training step.
1349        - **baselines** (`None` | `int` | `float` | `Tensor` | `tuple[int | float | Tensor, ...]`): starting point from which integral is computed. Please refer to the [Captum documentation](https://captum.ai/api/layer.html#captum.attr.LayerIntegratedGradients.attribute) for more details.
1350        - **target** (`Tensor` | `None`): the target batch of the training step.
1351        - **batch_idx** (`int`): the index of the current batch. This is an argument of the forward function during training.
1352        - **num_batches** (`int`): the number of batches in the training step. This is an argument of the forward function during training.
1353
1354        **Returns:**
1355        - **importance_step_layer** (`Tensor`): the neuron-wise importance of the layer of the training step.
1356        """
1357        layer = self.backbone.get_layer_by_name(layer_name)
1358
1359        # initialize the Layer Integrated Gradients object
1360        layer_integrated_gradients = LayerIntegratedGradients(
1361            forward_func=self.forward, layer=layer
1362        )
1363
1364        self.set_forward_func_return_logits_only(True)
1365        # calculate layer attribution of the step
1366        attribution = layer_integrated_gradients.attribute(
1367            inputs=input,
1368            baselines=baselines,
1369            target=target,
1370            additional_forward_args=("train", batch_idx, num_batches, self.task_id),
1371        )
1372        self.set_forward_func_return_logits_only(False)
1373
1374        attribution_abs_batch_mean = torch.mean(
1375            torch.abs(attribution),
1376            dim=[
1377                i for i in range(attribution.dim()) if i != 1
1378            ],  # average the features over batch samples
1379        )
1380
1381        importance_step_layer = attribution_abs_batch_mean
1382        importance_step_layer = importance_step_layer.detach()
1383
1384        return importance_step_layer

Get the raw neuron-wise importance (before scaling) of a layer of a training step. See $I^{\tau}_l(\mathbf{x},y)$ (before Eqs. (5) and (6)) in the paper. This method uses the absolute values of integrated gradients. We implement this using Layer Integrated Gradients in Captum.

Args:

layer_name (str): the name of layer to get neuron-wise importance.
input (Tensor | tuple[Tensor, ...]): the input batch of the training step.
baselines (None | int | float | Tensor | tuple[int | float | Tensor, ...]): starting point from which integral is computed. Please refer to the Captum documentation for more details.
target (Tensor | None): the target batch of the training step.
batch_idx (int): the index of the current batch. This is an argument of the forward function during training.
num_batches (int): the number of batches in the training step. This is an argument of the forward function during training.

Returns:

importance_step_layer (Tensor): the neuron-wise importance of the layer of the training step.

1386    def get_importance_step_layer_feature_ablation_abs(
1387        self: str,
1388        layer_name: str,
1389        input: Tensor | tuple[Tensor, ...],
1390        layer_baselines: None | int | float | Tensor | tuple[int | float | Tensor, ...],
1391        target: Tensor | None,
1392        batch_idx: int,
1393        num_batches: int,
1394        if_captum: bool = False,
1395    ) -> Tensor:
1396        r"""Get the raw neuron-wise importance (before scaling) of a layer of a training step. See $I^{\tau}_l(\mathbf{x},y)$ (before Eqs. (5) and (6)) in the paper. This method uses the absolute values of [feature ablation](https://link.springer.com/chapter/10.1007/978-3-319-10590-1_53) attribution. We implement this using [Layer Feature Ablation](https://captum.ai/api/layer.html#layer-feature-ablation) in Captum.
1397
1398        **Args:**
1399        - **layer_name** (`str`): the name of layer to get neuron-wise importance.
1400        - **input** (`Tensor` | `tuple[Tensor, ...]`): the input batch of the training step.
1401        - **layer_baselines** (`None` | `int` | `float` | `Tensor` | `tuple[int | float | Tensor, ...]`): reference values which replace each layer input / output value when ablated. Please refer to the [Captum documentation](https://captum.ai/api/layer.html#captum.attr.LayerFeatureAblation.attribute) for more details.
1402        - **target** (`Tensor` | `None`): the target batch of the training step.
1403        - **batch_idx** (`int`): the index of the current batch. This is an argument of the forward function during training.
1404        - **num_batches** (`int`): the number of batches in the training step. This is an argument of the forward function during training.
1405        - **if_captum** (`bool`): whether to use Captum or not. If `True`, we use Captum to calculate the feature ablation. If `False`, we use our implementation. Default is `False`, because our implementation is much faster.
1406
1407        **Returns:**
1408        - **importance_step_layer** (`Tensor`): the neuron-wise importance of the layer of the training step.
1409        """
1410        layer = self.backbone.get_layer_by_name(layer_name)
1411
1412        if not if_captum:
1413            # 1. Baseline logits (take first element of forward output)
1414            baseline_out, _, _ = self.forward(
1415                input, "train", batch_idx, num_batches, self.task_id
1416            )
1417            if target is not None:
1418                baseline_scores = baseline_out.gather(1, target.view(-1, 1)).squeeze(1)
1419            else:
1420                baseline_scores = baseline_out.sum(dim=1)
1421
1422            # 2. Capture layer’s output shape
1423            activs = {}
1424            handle = layer.register_forward_hook(
1425                lambda module, inp, out: activs.setdefault("output", out.detach())
1426            )
1427            _, _, _ = self.forward(input, "train", batch_idx, num_batches, self.task_id)
1428            handle.remove()
1429            layer_output = activs["output"]  # shape (B, F, ...)
1430
1431            # 3. Build baseline tensor matching that shape
1432            if layer_baselines is None:
1433                baseline_tensor = torch.zeros_like(layer_output)
1434            elif isinstance(layer_baselines, (int, float)):
1435                baseline_tensor = torch.full_like(layer_output, layer_baselines)
1436            elif isinstance(layer_baselines, Tensor):
1437                if layer_baselines.shape == layer_output.shape:
1438                    baseline_tensor = layer_baselines
1439                elif layer_baselines.shape == layer_output.shape[1:]:
1440                    baseline_tensor = layer_baselines.unsqueeze(0).repeat(
1441                        layer_output.size(0), *([1] * layer_baselines.ndim)
1442                    )
1443                else:
1444                    raise ValueError(...)
1445            else:
1446                raise ValueError(...)
1447
1448            B, F = layer_output.size(0), layer_output.size(1)
1449
1450            # 4. Create a “mega-batch” replicating the input F times
1451            if isinstance(input, tuple):
1452                mega_inputs = tuple(
1453                    t.unsqueeze(0).repeat(F, *([1] * t.ndim)).view(-1, *t.shape[1:])
1454                    for t in input
1455                )
1456            else:
1457                mega_inputs = (
1458                    input.unsqueeze(0)
1459                    .repeat(F, *([1] * input.ndim))
1460                    .view(-1, *input.shape[1:])
1461                )
1462
1463            # 5. Equally replicate the baseline tensor
1464            mega_baseline = (
1465                baseline_tensor.unsqueeze(0)
1466                .repeat(F, *([1] * baseline_tensor.ndim))
1467                .view(-1, *baseline_tensor.shape[1:])
1468            )
1469
1470            # 6. Precompute vectorized indices
1471            device = layer_output.device
1472            positions = torch.arange(F * B, device=device)  # [0,1,...,F*B-1]
1473            feat_idx = torch.arange(F, device=device).repeat_interleave(
1474                B
1475            )  # [0,0,...,1,1,...,F-1]
1476
1477            # 7. One hook to zero out each channel slice across the mega-batch
1478            def mega_ablate_hook(module, inp, out):
1479                out_mod = out.clone()
1480                # for each sample in mega-batch, zero its corresponding channel
1481                out_mod[positions, feat_idx] = mega_baseline[positions, feat_idx]
1482                return out_mod
1483
1484            h = layer.register_forward_hook(mega_ablate_hook)
1485            out_all, _, _ = self.forward(
1486                mega_inputs, "train", batch_idx, num_batches, self.task_id
1487            )
1488            h.remove()
1489
1490            # 8. Recover scores, reshape [F*B] → [F, B], diff & mean
1491            if target is not None:
1492                tgt_flat = target.unsqueeze(0).repeat(F, 1).view(-1)
1493                scores_all = out_all.gather(1, tgt_flat.view(-1, 1)).squeeze(1)
1494            else:
1495                scores_all = out_all.sum(dim=1)
1496
1497            scores_all = scores_all.view(F, B)
1498            diffs = torch.abs(baseline_scores.unsqueeze(0) - scores_all)
1499            importance_step_layer = diffs.mean(dim=1).detach()  # [F]
1500
1501            return importance_step_layer
1502
1503        else:
1504            # initialize the Layer Feature Ablation object
1505            layer_feature_ablation = LayerFeatureAblation(
1506                forward_func=self.forward, layer=layer
1507            )
1508
1509            # calculate layer attribution of the step
1510            self.set_forward_func_return_logits_only(True)
1511            attribution = layer_feature_ablation.attribute(
1512                inputs=input,
1513                layer_baselines=layer_baselines,
1514                # target=target, # disable target to enable perturbations_per_eval
1515                additional_forward_args=("train", batch_idx, num_batches, self.task_id),
1516                perturbations_per_eval=128,  # to accelerate the computation
1517            )
1518            self.set_forward_func_return_logits_only(False)
1519
1520            attribution_abs_batch_mean = torch.mean(
1521                torch.abs(attribution),
1522                dim=[
1523                    i for i in range(attribution.dim()) if i != 1
1524                ],  # average the features over batch samples
1525            )
1526
1527        importance_step_layer = attribution_abs_batch_mean
1528        importance_step_layer = importance_step_layer.detach()
1529
1530        return importance_step_layer

Get the raw neuron-wise importance (before scaling) of a layer of a training step. See $I^{\tau}_l(\mathbf{x},y)$ (before Eqs. (5) and (6)) in the paper. This method uses the absolute values of feature ablation attribution. We implement this using Layer Feature Ablation in Captum.

Args:

layer_name (str): the name of layer to get neuron-wise importance.
input (Tensor | tuple[Tensor, ...]): the input batch of the training step.
layer_baselines (None | int | float | Tensor | tuple[int | float | Tensor, ...]): reference values which replace each layer input / output value when ablated. Please refer to the Captum documentation for more details.
target (Tensor | None): the target batch of the training step.
batch_idx (int): the index of the current batch. This is an argument of the forward function during training.
num_batches (int): the number of batches in the training step. This is an argument of the forward function during training.
if_captum (bool): whether to use Captum or not. If True, we use Captum to calculate the feature ablation. If False, we use our implementation. Default is False, because our implementation is much faster.

Returns:

importance_step_layer (Tensor): the neuron-wise importance of the layer of the training step.

def get_importance_step_layer_lrp_abs( self: str, layer_name: str, input: torch.Tensor | tuple[torch.Tensor, ...], target: torch.Tensor | None, batch_idx: int, num_batches: int) -> torch.Tensor: View Source

1532    def get_importance_step_layer_lrp_abs(
1533        self: str,
1534        layer_name: str,
1535        input: Tensor | tuple[Tensor, ...],
1536        target: Tensor | None,
1537        batch_idx: int,
1538        num_batches: int,
1539    ) -> Tensor:
1540        r"""Get the raw neuron-wise importance (before scaling) of a layer of a training step. See $I^{\tau}_l(\mathbf{x},y)$ (before Eqs. (5) and (6)) in the paper. This method uses the absolute values of [LRP](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0130140). We implement this using [Layer LRP](https://captum.ai/api/layer.html#layer-lrp) in Captum.
1541
1542        **Args:**
1543        - **layer_name** (`str`): the name of layer to get neuron-wise importance.
1544        - **input** (`Tensor` | `tuple[Tensor, ...]`): the input batch of the training step.
1545        - **target** (`Tensor` | `None`): the target batch of the training step.
1546        - **batch_idx** (`int`): the index of the current batch. This is an argument of the forward function during training.
1547        - **num_batches** (`int`): the number of batches in the training step. This is an argument of the forward function during training.
1548
1549        **Returns:**
1550        - **importance_step_layer** (`Tensor`): the neuron-wise importance of the layer of the training step.
1551        """
1552        layer = self.backbone.get_layer_by_name(layer_name)
1553
1554        # initialize the Layer LRP object
1555        layer_lrp = LayerLRP(model=self, layer=layer)
1556
1557        # set model to evaluation mode to prevent updating the model parameters
1558        self.eval()
1559
1560        self.set_forward_func_return_logits_only(True)
1561        # calculate layer attribution of the step
1562        attribution = layer_lrp.attribute(
1563            inputs=input,
1564            target=target,
1565            additional_forward_args=("train", batch_idx, num_batches, self.task_id),
1566        )
1567        self.set_forward_func_return_logits_only(False)
1568
1569        attribution_abs_batch_mean = torch.mean(
1570            torch.abs(attribution),
1571            dim=[
1572                i for i in range(attribution.dim()) if i != 1
1573            ],  # average the features over batch samples
1574        )
1575
1576        importance_step_layer = attribution_abs_batch_mean
1577        importance_step_layer = importance_step_layer.detach()
1578
1579        return importance_step_layer

Get the raw neuron-wise importance (before scaling) of a layer of a training step. See $I^{\tau}_l(\mathbf{x},y)$ (before Eqs. (5) and (6)) in the paper. This method uses the absolute values of LRP. We implement this using Layer LRP in Captum.

Args:

layer_name (str): the name of layer to get neuron-wise importance.
input (Tensor | tuple[Tensor, ...]): the input batch of the training step.
target (Tensor | None): the target batch of the training step.
batch_idx (int): the index of the current batch. This is an argument of the forward function during training.
num_batches (int): the number of batches in the training step. This is an argument of the forward function during training.

Returns:

importance_step_layer (Tensor): the neuron-wise importance of the layer of the training step.

def get_importance_step_layer_cbp_adaptive_contribution(self: str, layer_name: str, activation: torch.Tensor) -> torch.Tensor: View Source

1581    def get_importance_step_layer_cbp_adaptive_contribution(
1582        self: str,
1583        layer_name: str,
1584        activation: Tensor,
1585    ) -> Tensor:
1586        r"""Get the raw neuron-wise importance (before scaling) of a layer of a training step. See $I^{\tau}_l(\mathbf{x},y)$ (before Eqs. (5) and (6)) in the paper. This method uses the sum of absolute values of layer output weights multiplied by absolute values of activation, then divided by the reciprocal of sum of absolute values of layer input weights. It is equal to the adaptive contribution utility in [CBP](https://www.nature.com/articles/s41586-024-07711-7).
1587
1588        **Args:**
1589        - **layer_name** (`str`): the name of layer to get neuron-wise importance.
1590        - **activation** (`Tensor`): the activation tensor of the layer. It has the same size of (number of units, ).
1591
1592        **Returns:**
1593        - **importance_step_layer** (`Tensor`): the neuron-wise importance of the layer of the training step.
1594        """
1595        layer = self.backbone.get_layer_by_name(layer_name)
1596
1597        input_weight_abs = torch.abs(layer.weight.data)
1598        input_weight_abs_sum = torch.sum(
1599            input_weight_abs,
1600            dim=[
1601                i for i in range(input_weight_abs.dim()) if i != 0
1602            ],  # sum over the input dimension
1603        )
1604        input_weight_abs_sum_reciprocal = torch.reciprocal(input_weight_abs_sum)
1605
1606        output_weight_abs = torch.abs(self.next_layer(layer_name).weight.data)
1607        output_weight_abs_sum = torch.sum(
1608            output_weight_abs,
1609            dim=[
1610                i for i in range(output_weight_abs.dim()) if i != 1
1611            ],  # sum over the output dimension
1612        )
1613
1614        activation_abs_batch_mean = torch.mean(
1615            torch.abs(activation),
1616            dim=[
1617                i for i in range(activation.dim()) if i != 1
1618            ],  # average the features over batch samples
1619        )
1620
1621        importance_step_layer = (
1622            output_weight_abs_sum
1623            * activation_abs_batch_mean
1624            * input_weight_abs_sum_reciprocal
1625        )
1626        importance_step_layer = importance_step_layer.detach()
1627
1628        return importance_step_layer

Get the raw neuron-wise importance (before scaling) of a layer of a training step. See $I^{\tau}_l(\mathbf{x},y)$ (before Eqs. (5) and (6)) in the paper. This method uses the sum of absolute values of layer output weights multiplied by absolute values of activation, then divided by the reciprocal of sum of absolute values of layer input weights. It is equal to the adaptive contribution utility in CBP.

Args:

layer_name (str): the name of layer to get neuron-wise importance.
activation (Tensor): the activation tensor of the layer. It has the same size of (number of units, ).

Returns:

importance_step_layer (Tensor): the neuron-wise importance of the layer of the training step.

clarena.cl_algorithms.fgadahat

Inherited Members