Conceptual Level
In the conceptual level, I focus on the basic understanding of the concepts, without delving into the details of the mathematics and implementation, so anyone with general knowledge of AI can understand the content.
Q: What is saliency map?
A: In image classification, saliency map is a pixel-wise visualisation of the importance of each pixel in the input image to the model’s prediction. It shows which part of the image the model focuses on when making the prediction.
This gives us an visually intuitive explanation of why the model makes a certain prediction. Therefore, saliency map is a XAI (Explainable AI) technique. It belongs to the category of attribution methods as the pixel-wise importance is a form of attribution on features.
It belongs to local explanation, as it explains the prediction of a single input instance, but can also be developed into global explanation (explaining the entire model) by methods such as aggregating the saliency maps of multiple instances.
Q: Any other forms of saliency maps other than images?
A: Saliency maps are not limited to images. They can be applied to other types of data, such as text, audio, and video. For example, in text classification, saliency map can show the importance of each word in the text to the model’s prediction.
Q: Any real-world applications of saliency maps?
A: Apart from a solid tool to interpret image class prediction in XAI (Explainable AI) systems, there are also applications in computer vision itself, such as, object detection and image segmentation, as saliency maps can visually help to locate the objects in the image.
Q: Where was saliency map introduced? What is the background?
A: The concept of saliency map was first introduced as a computer vision term in 1990s before the deep learning era. It was used for studying human visual attention mechanism, i.e. the real vision stuff.
When it comes to the age of deep learning, the concept of saliency map was borrowed to explain the prediction of deep neural networks, thus developed into a XAI technique. This is highlighted in the viral paper of 2014 “Visualizing and Understanding Convolutional Networks” (Zeiler and Fergus 2014).
As saliency maps as a tool benefits computer vision itself, it accelerates the development of object detection and image segmentation later.
Q: What is the rough idea of getting a saliency map?
A: The main idea is giving pixel-wise changes to the target image and watching the difference of certain indicator between them. The more the indicator changes, the more important the pixel is. Several categories of approaches have been developed in terms of the method of pixel-wise changes and the indicator:
- Perturbation-based methods: They occlude or perturb the input image based on pixel position and observe the change of the model’s prediction.
- Gradient-based methods: They use the gradient of the model’s output with respect to the input image pixel. If you don’t know what gradient is, it physically means the ratio of the change of the output to the change of the input (suppose the input change is infinitely small).
- Backpropagation-based methods: They construct special backpropagation processes particularly for inspecting pixel importance. Gradients are not typically involved.
Q: What are classical approaches to get saliency map?
A: Here are some classical approaches:
- Perturbation-based methods: Occlusion Sensitivity (Zeiler and Fergus 2014), Simplifying Images (Zhou et al. 2014), LIME (Ribeiro, Singh, and Guestrin 2016) , SHAP (Lundberg and Lee 2017), CAV (Kim et al. 2018), etc.
- Gradient-based methods: Vanilla Gradients (Zeiler and Fergus 2014), SmoothGrad (Smilkov et al. 2017), Integrated Gradients (Sundararajan, Taly, and Yan 2017), DeconvNet (Noh, Hong, and Han 2015), Guided Backpropagation (Springenberg et al. 2014), LRP (Bach et al. 2015), DeepLIFT (Shrikumar, Greenside, and Kundaje 2017), Grad-CAM (Selvaraju et al. 2017), etc.
- Backpropagation-based methods: CAM (Zhou et al. 2016) series.
Q: Any inspirations on continual learning (that I am researching right now)?
A: Saliency map as a unit-wise importance metric can be potentially used as information for previous tasks to prevent forgetting.
Implementation Level
In the implementation level, I’d go into the details of the mathematics and implementation, for anyone who is working on AI projects and wants to know how the technique exactly does.
We here only talk about the saliency map for image classification, as it’s the most common application of saliency map.
Q: What are the input and output of knowledge distillation?
A: Here are:
Inputs:
- A well-trained machine learning model
- Target image
to get the saliency map
Outputs:
- A saliency map
, which is the same size as the input image
Q: How to calculate the saliency map?
(I don’t have time to cover here. I strongly recommend referring to (杨朋波 et al. 2023) (in Chinese), which gives you a very comprehensive survey on the methods. )
Q: Any extension of saliency map?
A: Other than the pixel-wise attribution on the input, saliency map can be extended to other forms:
- Feature attribution: the attribution of the output with respect to units of input. We have talked about it above.
- Layer attribution: the attribution of the output with respect to the units in hidden layers in the model.
- Neuron / unit attribution: the attribution of a certain neuron in hidden layers in the model with respect to units of input.
Q: What are the common tools to implement saliency map?
A: Captum, a PyTorch-based model interpretability Python library, provides a variety of attribution methods.