Skip to content

Interpretability in Convolutional Neural Networks

Interpretability in Convolutional Neural Networks (CNNs) is essential to understand the decisions made by the model, detect potential biases, and improve robustness against perturbations or distributional shifts. Due to their depth and the combination of convolutions with nonlinear activation functions, CNNs behave as highly complex systems whose internal mechanisms are difficult to inspect directly. For this reason, specific techniques are developed to visualize which regions or features of the input signal contribute most significantly to the predictions.

This document describes and implements several of the most widely used methodologies for interpreting CNNs: saliency maps, Grad-CAM, Guided Grad-CAM (based on Guided Backpropagation), occlusion analysis, and Integrated Gradients. A complete, functional implementation on CIFAR-10 using an adapted ResNet-18 model is then presented, organized linearly for step-by-step execution and easily convertible into a Jupyter Notebook.

Saliency Maps

Saliency maps rely on computing the gradient of the model output with respect to each input pixel. Intuitively, if a small variation in a pixel produces a significant change in the output associated with a specific class, that pixel is considered important for the decision. The absolute value of this gradient is used as a local relevance measure.

Given a model \(f(\cdot)\) and an input image \(x\), the saliency map for a class \(c\) is defined as

\[S = \left| \frac{\partial f_c(x)}{\partial x} \right|\]

When the network processes inputs with multiple channels (for example, RGB images), it is common to aggregate the channel-wise information to construct a two-dimensional map. A simple strategy is to take the maximum over the channel dimension:

\[S_{i,j} = \max_{k} \left| \frac{\partial f_c(x)}{\partial x_{k,i,j}} \right|\]

This map provides, for each spatial position \((i,j)\), a sensitivity measure of the class score \(f_c\) with respect to perturbations of the corresponding pixels. Saliency maps are conceptually simple and computationally efficient; however, the resulting visualizations are often noisy and do not always align clearly with semantically interpretable regions of the image.

The following code initializes the environment, defines basic configuration, and implements a class that generates saliency maps based on gradients, together with a visualization function that overlays the resulting map on the original image.

"""Interpretability in Convolutional Neural Networks

Complete functional implementation with CIFAR-10

Implemented techniques:
1. Saliency Maps
2. Grad-CAM
3. Guided Grad-CAM (based on Guided Backpropagation)
4. Occlusion Analysis
5. Integrated Gradients
"""

# Standard libraries
# IMPORTS
import warnings

# 3pps
import matplotlib.pyplot as plt
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision.models as models
from scipy.ndimage import zoom
from torch.utils.data import DataLoader
from torchvision import datasets, transforms

warnings.filterwarnings("ignore")

print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}\n")

# GLOBAL CONFIGURATION
CONFIG = {
    "device": "cuda" if torch.cuda.is_available() else "cpu",
    "batch_size": 32,
}

CIFAR10_CLASSES = [
    "airplane",
    "automobile",
    "bird",
    "cat",
    "deer",
    "dog",
    "frog",
    "horse",
    "ship",
    "truck",
]

print(f"Configuration: {CONFIG}\n")

CIFAR-10 Data Preparation

To illustrate the interpretability techniques, the CIFAR-10 dataset is used. CIFAR-10 contains color images of size \(32 \times 32\) belonging to ten different classes. The following function downloads and prepares the test set, applying a standard normalization that is widely used for this dataset.

def prepare_cifar10_data():
    """
    Downloads and prepares the CIFAR-10 test set
    with standard normalization.
    """
    print("Preparing CIFAR-10...")
    transform = transforms.Compose(
        [
            transforms.ToTensor(),
            transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2470, 0.2435, 0.2616)),
        ]
    )

    test_dataset = datasets.CIFAR10(
        root="./data", train=False, download=True, transform=transform
    )

    test_loader = DataLoader(
        test_dataset, batch_size=CONFIG["batch_size"], shuffle=False, num_workers=2
    )

    print(f"Test: {len(test_dataset)} images\n")
    return test_loader, test_dataset

ResNet-18 Model Adapted to CIFAR-10

A ResNet-18 model pretrained on ImageNet is used as the base and adapted to the characteristics of CIFAR-10. The adaptation consists of modifying the first convolutional layer to work more appropriately with \(32 \times 32\) images and adjusting the final fully connected layer to the number of classes in CIFAR-10. Although the model is loaded with pretrained ImageNet weights, the final layer is initialized randomly, so the performance may not be optimal without fine-tuning. However, this limitation does not affect the main purpose of the code, which is to illustrate interpretability techniques in a functional manner.

def load_pretrained_model():
    """
    Loads a ResNet-18 pretrained on ImageNet and adapts it to CIFAR-10.
    """
    print("Loading pretrained model...")
    model = models.resnet18(weights=models.ResNet18_Weights.IMAGENET1K_V1)

    # Adapt the first layer to 32x32 images (remove initial max-pooling)
    model.conv1 = nn.Conv2d(3, 64, kernel_size=3, stride=1, padding=1, bias=False)
    model.maxpool = nn.Identity()

    # Adapt final layer for 10 CIFAR-10 classes
    model.fc = nn.Linear(model.fc.in_features, 10)

    model = model.to(CONFIG["device"])
    model.eval()

    print(f"Model loaded on {CONFIG['device']}\n")
    return model

Saliency Maps: Implementation and Visualization

The following implementation computes saliency maps via gradients and includes a visualization routine that facilitates the direct analysis of which image regions contribute most to the model's prediction.

print("=" * 70)
print("1. SALIENCY MAPS")
print("=" * 70)
print(
    """Saliency maps compute the gradient of the output with respect
to each image pixel, indicating which regions have the largest
influence on the prediction.

Advantages:
- Simple and efficient computation.
- Shows the direct influence of pixels.

Limitations:
- Visualizations are often noisy.
- Do not always align with semantically clear regions.
"""
)

class SaliencyMapGenerator:
    """Generates saliency maps using gradients."""

    def __init__(self, model: nn.Module, device: str = "cuda") -> None:
        self.model = model.to(device)
        self.model.eval()
        self.device = device

    def generate_saliency(self, image: torch.Tensor, target_class: int | None = None):
        """
        Computes the saliency map for a single image.

        Args:
            image: Tensor [1, 3, H, W] normalized.
            target_class: Target class index; if None, the model prediction is used.

        Returns:
            2D saliency map (numpy array).
        """
        image = image.to(self.device)
        image.requires_grad = True

        output = self.model(image)

        if target_class is None:
            target_class = output.argmax(dim=1)

        self.model.zero_grad()
        output[0, target_class].backward()

        saliency = image.grad.data.abs()
        # Channel aggregation: maximum along the channel axis
        saliency, _ = torch.max(saliency, dim=1)

        return saliency.squeeze().cpu().numpy()

    def visualize_saliency(
        self,
        image: torch.Tensor,
        original_image: np.ndarray,
        target_class: int | None = None,
    ) -> None:
        """
        Visualizes the saliency map and its overlay on the original image.

        Args:
            image: Tensor [1, 3, H, W] normalized.
            original_image: Denormalized image [H, W, 3] in [0, 1].
            target_class: Target class; if None, the model prediction is used.
        """
        saliency = self.generate_saliency(image, target_class)

        # Normalize to [0, 1] for visualization
        saliency = (saliency - saliency.min()) / (
            saliency.max() - saliency.min() + 1e-8
        )

        fig, axes = plt.subplots(1, 3, figsize=(15, 5))

        axes[0].imshow(original_image)
        axes[0].set_title("Original Image", fontsize=12, fontweight="bold")
        axes[0].axis("off")

        axes[1].imshow(saliency, cmap="hot")
        axes[1].set_title("Saliency Map", fontsize=12, fontweight="bold")
        axes[1].axis("off")

        axes[2].imshow(original_image)
        axes[2].imshow(saliency, cmap="hot", alpha=0.5)
        axes[2].set_title("Overlay", fontsize=12, fontweight="bold")
        axes[2].axis("off")

        plt.tight_layout()
        plt.show()

Grad-CAM (Gradient-weighted Class Activation Mapping)

Grad-CAM generates heatmaps that localize the regions of an image that contribute most strongly to the prediction for a specific class. Instead of operating directly on the pixels, Grad-CAM works on the activation maps of an internal convolutional layer, which tends to produce spatial relevance maps that are more structured and semantically interpretable.

Let \(A^k \in \mathbb{R}^{H \times W}\) denote the activation map associated with channel \(k\) of a selected convolutional layer. For a class \(c\), importance coefficients are computed by performing a global average pooling of the gradients over the spatial dimensions:

\[\alpha_k = \frac{1}{HW} \sum_{i=1}^{H} \sum_{j=1}^{W} \frac{\partial f_c}{\partial A_{ij}^k}\]

Using these coefficients, a class-specific weighted activation map is constructed as

\[L_c^{\text{Grad-CAM}} = \mathrm{ReLU}\left( \sum_k \alpha_k A^k \right)\]

The ReLU function is applied to retain only positive contributions, under the assumption that activations that increase the class score are those to be highlighted. The spatial resolution of the Grad-CAM map is limited by the size of the activation maps of the chosen layer; therefore, the resulting map is often interpolated to match the size of the original image.

The implementation below uses hooks to capture activations and gradients at the target layer and generates the corresponding Grad-CAM map.

print("=" * 70)
print("2. GRAD-CAM (Gradient-weighted Class Activation Mapping)")
print("=" * 70)
print(
    """Grad-CAM generates heatmaps that highlight the regions of the image
that are most important for a specific class, using gradients with
respect to an internal convolutional layer.

Advantages:
- More interpretable maps than basic saliency maps.
- Localizes relevant object regions.

Limitations:
- Depends on the choice of the target layer.
- Resolution is limited by the resolution of that layer.
"""
)

class GradCAM:
    """Grad-CAM implementation for a target layer of a CNN."""

    def __init__(
        self, model: nn.Module, target_layer: str, device: str = "cuda"
    ) -> None:
        self.model = model.to(device)
        self.target_layer = target_layer
        self.device = device
        self.gradients: torch.Tensor | None = None
        self.activations: torch.Tensor | None = None
        self._register_hooks()

    def _register_hooks(self) -> None:
        """
        Registers hooks on the target layer to capture activations and gradients
        during forward and backward passes.
        """

        def forward_hook(module, input, output):
            self.activations = output.detach()

        def backward_hook(module, grad_input, grad_output):
            self.gradients = grad_output[0].detach()

        for name, module in self.model.named_modules():
            if name == self.target_layer:
                module.register_forward_hook(forward_hook)
                module.register_full_backward_hook(backward_hook)
                break

    def generate_cam(self, image: torch.Tensor, target_class: int | None = None):
        """
        Generates the Grad-CAM map for an image and a target class.

        Args:
            image: Tensor [1, 3, H, W] normalized.
            target_class: Target class; if None, the model prediction is used.

        Returns:
            cam: Normalized 2D Grad-CAM map (numpy array).
            target_class: Class used for the explanation.
        """
        self.model.eval()
        image = image.to(self.device)

        output = self.model(image)

        if target_class is None:
            target_class = output.argmax(dim=1).item()

        self.model.zero_grad()
        output[0, target_class].backward()

        # Weights: global average of gradients over H x W
        weights = torch.mean(self.gradients, dim=(2, 3), keepdim=True)
        cam = torch.sum(weights * self.activations, dim=1, keepdim=True)
        cam = torch.relu(cam)
        cam = cam.squeeze().cpu().numpy()
        cam = (cam - cam.min()) / (cam.max() - cam.min() + 1e-8)

        return cam, target_class

    def visualize_cam(
        self,
        image: torch.Tensor,
        original_image: np.ndarray,
        target_class: int | None = None,
    ):
        """
        Visualizes the Grad-CAM map and its overlay on the original image.

        Args:
            image: Tensor [1, 3, H, W] normalized.
            original_image: Denormalized image [H, W, 3] in [0, 1].
            target_class: Target class; if None, the model prediction is used.
        """
        cam, pred_class = self.generate_cam(image, target_class)

        # Resize the map to the original image size via interpolation
        cam_resized = zoom(
            cam,
            (
                original_image.shape[0] / cam.shape[0],
                original_image.shape[1] / cam.shape[1],
            ),
        )

        fig, axes = plt.subplots(1, 3, figsize=(15, 5))

        axes[0].imshow(original_image)
        axes[0].set_title("Original Image", fontsize=12, fontweight="bold")
        axes[0].axis("off")

        axes[1].imshow(cam_resized, cmap="jet")
        axes[1].set_title(
            f"Grad-CAM (Class: {CIFAR10_CLASSES[pred_class]})",
            fontsize=12,
            fontweight="bold",
        )
        axes[1].axis("off")

        axes[2].imshow(original_image)
        axes[2].imshow(cam_resized, cmap="jet", alpha=0.5)
        axes[2].set_title("Overlay", fontsize=12, fontweight="bold")
        axes[2].axis("off")

        plt.tight_layout()
        plt.show()

        return cam_resized, pred_class

Guided Backpropagation and Guided Grad-CAM

Guided Backpropagation modifies the gradient flow through ReLU units by forcing to zero those gradients that are negative both in the activation and in the incoming gradient. This filtering yields sharper gradient maps that focus on features considered relevant.

Guided Grad-CAM combines the global localization capability of Grad-CAM with the pixel-level detail of Guided Backpropagation. The usual procedure consists of three sequential steps: first, a Grad-CAM map is computed for the target class; second, guided gradients with respect to the input image are obtained; finally, the Grad-CAM map is upsampled to the input resolution and multiplied elementwise by the guided gradients. The result is a high-resolution visualization in which edges and fine details inside the Grad-CAM-relevant regions are emphasized.

The following code implements Guided Backpropagation. This implementation integrates naturally with the GradCAM class to build Guided Grad-CAM by multiplying the resized Grad-CAM map by the guided gradients.

print("=" * 70)
print("3. GUIDED GRAD-CAM")
print("=" * 70)
print(
    """Guided Grad-CAM combines Grad-CAM with Guided Backpropagation
to obtain high-resolution visualizations that are both
spatially precise and detailed at the pixel level.

This script implements Guided Backpropagation,
which can be combined with Grad-CAM maps.
"""
)

class GuidedBackprop:
    """Guided Backpropagation implementation for a CNN."""

    def __init__(self, model: nn.Module, device: str = "cuda") -> None:
        self.model = model.to(device)
        self.device = device
        self._register_hooks()

    def _register_hooks(self) -> None:
        """
        Registers hooks on ReLU layers to filter negative gradients
        during the backward pass.
        """

        def backward_hook(module, grad_input, grad_output):
            if len(grad_input) > 0 and grad_input[0] is not None:
                return (torch.clamp(grad_input[0], min=0.0),)
            return grad_input

        for module in self.model.modules():
            if isinstance(module, nn.ReLU):
                module.register_full_backward_hook(backward_hook)

    def generate_gradients(self, image: torch.Tensor, target_class: int | None = None):
        """
        Generates guided gradients with respect to the input image.

        Args:
            image: Tensor [1, 3, H, W] normalized.
            target_class: Target class; if None, the model prediction is used.

        Returns:
            Guided gradients as a numpy array [3, H, W].
        """
        self.model.eval()
        image = image.to(self.device)
        image.requires_grad = True

        output = self.model(image)

        if target_class is None:
            target_class = output.argmax(dim=1)

        self.model.zero_grad()
        output[0, target_class].backward()

        gradients = image.grad.data.cpu().numpy()[0]
        return gradients

Occlusion Analysis

Occlusion analysis adopts a complementary viewpoint to gradient-based methods. Instead of exploring the internal sensitivity of the model, it modifies the input explicitly. Small regions (patches) of the image are systematically occluded, and the effect on the probability assigned to a given class is measured. When occluding a region significantly decreases the probability, that region is interpreted as important for the prediction.

Formally, for each position \((i,j)\) of a sliding window, an occluded version of the image \(x^{(i,j)}\) is constructed, and the difference

\[\Delta p_c^{(i,j)} = p_c(x) - p_c\bigl(x^{(i,j)}\bigr)\]

is evaluated, where \(p_c(x)\) denotes the model probability assigned to class \(c\). The resulting sensitivity map directly quantifies the importance of each region in terms of its impact on the model's confidence. This technique is independent of gradients and specific architectural details, although its computational cost increases with image resolution, due to the large number of model evaluations required.

The class below implements a simple occlusion analysis, allowing the patch size and stride of the sliding window to be adjusted.

print("=" * 70)
print("4. OCCLUSION ANALYSIS")
print("=" * 70)
print(
    """Systematically occludes regions of the image to observe
how the prediction changes, revealing which areas are critical.

Advantages:
- Direct interpretation at the input level.
- Does not require gradients or internal access to the architecture.

Limitations:
- High computational cost.
- Sensitive to patch size and stride.
"""
)

class OcclusionAnalysis:
    """Occlusion analysis for obtaining sensitivity maps."""

    def __init__(self, model: nn.Module, device: str = "cuda") -> None:
        self.model = model.to(device)
        self.model.eval()
        self.device = device

    def analyze(
        self,
        image: torch.Tensor,
        target_class: int | None = None,
        patch_size: int = 4,
        stride: int = 2,
    ):
        """
        Computes a sensitivity map via systematic occlusion.

        Args:
            image: Tensor [1, 3, H, W] normalized.
            target_class: Target class; if None, the model prediction is used.
            patch_size: Side length of the square occlusion patch in pixels.
            stride: Stride of the sliding occlusion window.

        Returns:
            2D sensitivity map (numpy array).
        """
        image = image.to(self.device)

        with torch.no_grad():
            output = self.model(image)
            if target_class is None:
                target_class = output.argmax(dim=1).item()
            baseline_prob = torch.softmax(output, dim=1)[0, target_class].item()

        _, _, h, w = image.shape
        sensitivity_map = np.zeros((h, w))

        for i in range(0, h - patch_size + 1, stride):
            for j in range(0, w - patch_size + 1, stride):
                occluded_image = image.clone()
                occluded_image[:, :, i : i + patch_size, j : j + patch_size] = 0

                with torch.no_grad():
                    output = self.model(occluded_image)
                    prob = torch.softmax(output, dim=1)[0, target_class].item()

                sensitivity = baseline_prob - prob
                current = sensitivity_map[i : i + patch_size, j : j + patch_size].mean()
                sensitivity_map[i : i + patch_size, j : j + patch_size] = max(
                    current, sensitivity
                )

        return sensitivity_map

    def visualize(
        self,
        image: torch.Tensor,
        original_image: np.ndarray,
        target_class: int | None = None,
        patch_size: int = 4,
        stride: int = 2,
    ) -> None:
        """
        Visualizes the sensitivity map obtained via occlusion.

        Args:
            image: Tensor [1, 3, H, W] normalized.
            original_image: Denormalized image [H, W, 3] in [0, 1].
            target_class: Target class; if None, the model prediction is used.
            patch_size: Occlusion patch size.
            stride: Sliding window stride.
        """
        print(f"Analyzing with patch_size={patch_size}, stride={stride}...")
        sensitivity = self.analyze(image, target_class, patch_size, stride)
        sensitivity = (sensitivity - sensitivity.min()) / (
            sensitivity.max() - sensitivity.min() + 1e-8
        )

        fig, axes = plt.subplots(1, 3, figsize=(15, 5))

        axes[0].imshow(original_image)
        axes[0].set_title("Original Image", fontsize=12, fontweight="bold")
        axes[0].axis("off")

        axes[1].imshow(sensitivity, cmap="hot")
        axes[1].set_title("Sensitivity Map", fontsize=12, fontweight="bold")
        axes[1].axis("off")

        axes[2].imshow(original_image)
        axes[2].imshow(sensitivity, cmap="hot", alpha=0.5)
        axes[2].set_title("Overlay", fontsize=12, fontweight="bold")
        axes[2].axis("off")

        plt.tight_layout()
        plt.show()

Integrated Gradients

Integrated Gradients is a theoretically grounded method to attribute a model prediction to input features. Instead of considering the gradient only at the point \(x\), this method integrates gradients along a continuous path that connects a baseline \(x'\) (for example, a completely black image) to the actual image \(x\). This approach mitigates gradient saturation issues and satisfies desirable attribution axioms such as sensitivity and implementation invariance.

Let \(f_c\) denote the score for class \(c\) (for example, the pre-softmax output). Integrated Gradients for dimension \(i\) is defined as

\[\mathrm{IG}_i(x) = (x_i - x'_i) \int_{\alpha=0}^{1}\frac{\partial f_c\bigl(x' + \alpha (x - x')\bigr)}{\partial x_i} \, d\alpha\]

In practice, the integral is approximated by a discrete sum over \(m\) uniformly spaced steps:

\[\mathrm{IG}_i(x) \approx (x_i - x'_i) \cdot \frac{1}{m} \sum_{k=1}^{m}\frac{\partial f_c\bigl(x' + \tfrac{k}{m}(x - x')\bigr)}{\partial x_i}\]

Aggregating the absolute attributions over channels yields a spatial relevance map that is typically smoother and more stable than basic saliency maps, at the cost of requiring multiple model evaluations along the path between the baseline and the original image.

The following implementation computes Integrated Gradients for a single image, allowing the baseline and the number of integration steps to be specified.

print("=" * 70)
print("5. INTEGRATED GRADIENTS")
print("=" * 70)
print(
    """Method that attributes the prediction to input features by
integrating gradients along a path from a baseline
to the actual image.

Advantages:
- Strong theoretical foundation.
- Mitigates gradient saturation issues.

Limitations:
- Requires multiple model evaluations.
- Depends on the choice of baseline.
"""
)

class IntegratedGradients:
    """Integrated Gradients implementation for PyTorch models."""

    def __init__(self, model: nn.Module, device: str = "cuda") -> None:
        self.model = model.to(device)
        self.model.eval()
        self.device = device

    def generate(
        self,
        image: torch.Tensor,
        target_class: int | None = None,
        baseline: torch.Tensor | None = None,
        steps: int = 50,
    ):
        """
        Computes Integrated Gradients for an image and target class.

        Args:
            image: Tensor [1, 3, H, W] normalized.
            target_class: Target class; if None, the model prediction is used.
            baseline: Tensor [1, 3, H, W] used as reference. If None, a zero tensor is used.
            steps: Number of points along the integration path.

        Returns:
            Numpy array [C, H, W] with per-channel attributions.
        """
        if baseline is None:
            baseline = torch.zeros_like(image)

        baseline = baseline.to(self.device)
        image = image.to(self.device)

        with torch.no_grad():
            output = self.model(image)
            if target_class is None:
                target_class = output.argmax(dim=1).item()

        # Linear path between baseline and image
        scaled_inputs = [
            baseline + (float(i) / steps) * (image - baseline) for i in range(steps + 1)
        ]
        scaled_inputs = torch.cat(scaled_inputs, dim=0)
        scaled_inputs.requires_grad = True

        output = self.model(scaled_inputs)
        self.model.zero_grad()

        target_output = output[:, target_class]
        target_output.backward(torch.ones_like(target_output))

        gradients = scaled_inputs.grad
        avg_gradients = torch.mean(gradients, dim=0, keepdim=True)
        integrated_grads = (image - baseline) * avg_gradients

        return integrated_grads.squeeze().cpu().detach().numpy()

    def visualize(
        self,
        image: torch.Tensor,
        original_image: np.ndarray,
        target_class: int | None = None,
    ) -> None:
        """
        Visualizes spatially aggregated Integrated Gradients and its overlay.

        Args:
            image: Tensor [1, 3, H, W] normalized.
            original_image: Denormalized image [H, W, 3] in [0, 1].
            target_class: Target class; if None, the model prediction is used.
        """
        print("Computing Integrated Gradients (50 steps)...")
        ig = self.generate(image, target_class)

        ig_aggregated = np.sum(np.abs(ig), axis=0)
        ig_aggregated = (ig_aggregated - ig_aggregated.min()) / (
            ig_aggregated.max() - ig_aggregated.min() + 1e-8
        )

        fig, axes = plt.subplots(1, 3, figsize=(15, 5))

        axes[0].imshow(original_image)
        axes[0].set_title("Original Image", fontsize=12, fontweight="bold")
        axes[0].axis("off")

        axes[1].imshow(ig_aggregated, cmap="hot")
        axes[1].set_title("Integrated Gradients", fontsize=12, fontweight="bold")
        axes[1].axis("off")

        axes[2].imshow(original_image)
        axes[2].imshow(ig_aggregated, cmap="hot", alpha=0.5)
        axes[2].set_title("Overlay", fontsize=12, fontweight="bold")
        axes[2].axis("off")

        plt.tight_layout()
        plt.show()

Visualization Utilities

To interpret the results properly, CIFAR-10 images should be denormalized before visualization. The function below reverses the standard normalization applied during preprocessing and returns an image in a format suitable for matplotlib.

def denormalize_cifar10(tensor: torch.Tensor) -> np.ndarray:
    """
    Denormalizes a CIFAR-10 tensor for visualization.

    Args:
        tensor: Tensor [3, H, W] normalized with CIFAR-10 mean and std.

    Returns:
        Image as numpy array [H, W, 3] with values in [0, 1].
    """
    mean = torch.tensor([0.4914, 0.4822, 0.4465]).view(3, 1, 1)
    std = torch.tensor([0.2470, 0.2435, 0.2616]).view(3, 1, 1)
    denorm = tensor * std + mean
    denorm = torch.clamp(denorm, 0, 1)
    return denorm.permute(1, 2, 0).numpy()

Complete Interpretability Pipeline

Finally, all components are integrated into a coherent workflow that applies the different interpretability techniques to a test image from CIFAR-10. The pipeline includes data loading, model loading, sample selection, and the sequential execution of saliency maps, Grad-CAM, occlusion analysis, and Integrated Gradients. Guided Backpropagation is implemented and can be used to construct Guided Grad-CAM if one wishes to extend the pipeline.

def run_complete_pipeline() -> None:
    """
    Executes all interpretability techniques in an integrated way
    on a single CIFAR-10 image.
    """
    print("\n" + "=" * 70)
    print("COMPLETE PIPELINE: INTERPRETABILITY IN CNNs")
    print("=" * 70 + "\n")

    # Data
    test_loader, _ = prepare_cifar10_data()

    # Model
    model = load_pretrained_model()

    # Select a test image
    print("Selecting test image...")
    images, labels = next(iter(test_loader))
    image = images[0:1]
    label = labels[0].item()
    original_image = denormalize_cifar10(images[0].clone())
    print(f"True class: {CIFAR10_CLASSES[label]}\n")

    # 1. Saliency Maps
    print("\n" + "=" * 70)
    print("RUNNING: Saliency Maps")
    print("=" * 70 + "\n")
    saliency_gen = SaliencyMapGenerator(model, CONFIG["device"])
    saliency_gen.visualize_saliency(image.clone(), original_image)

    # 2. Grad-CAM
    print("\n" + "=" * 70)
    print("RUNNING: Grad-CAM")
    print("=" * 70 + "\n")
    grad_cam = GradCAM(model, target_layer="layer4", device=CONFIG["device"])
    grad_cam.visualize_cam(image.clone(), original_image)

    # 3. Occlusion Analysis
    print("\n" + "=" * 70)
    print("RUNNING: Occlusion Analysis")
    print("=" * 70 + "\n")
    occlusion = OcclusionAnalysis(model, device=CONFIG["device"])
    occlusion.visualize(image.clone(), original_image, patch_size=4, stride=2)

    # 4. Integrated Gradients
    print("\n" + "=" * 70)
    print("RUNNING: Integrated Gradients")
    print("=" * 70 + "\n")
    ig = IntegratedGradients(model, device=CONFIG["device"])
    ig.visualize(image.clone(), original_image)

if __name__ == "__main__":
    run_complete_pipeline()

This complete pipeline provides a practical framework for exploring interpretability in CNNs on CIFAR-10. Although the ResNet-18 model is not explicitly fine-tuned on this dataset within the script, the code structure allows the same analysis workflow to be reused with a model trained specifically on CIFAR-10 by simply replacing the model loading function with a version that retrieves weights adapted to the domain of interest.