Unit 3: Visualization and Understanding CNN


1. Introduction to CNNs

  • Convolutional Neural Networks (CNNs) are deep learning models particularly effective for image data, exploiting spatial hierarchies for feature learning.

  • Main components: Convolutional layers, Activation functions (e.g., ReLU), Pooling layers, and Fully-connected layers.


2. Evolution of CNN Architectures

Classic CNN Architectures:

  • AlexNet: First deep CNN to win ImageNet (2012), used ReLU, dropout, and GPUs for training.

  • ZFNet (Zeiler & Fergus Net): Improved on AlexNet, visualized intermediate activations to better understand learned features.

  • VGGNet: Very deep nets with small (3x3) convolutions, emphasized architectural simplicity, improved performance.


3. Visualization of Kernels/Filters

  • Kernel visualization helps interpret what features the network’s convolutional layers are learning.

  • Filters in early layers visualize as edge or color detectors, while deeper layers capture more complex patterns (textures, objects).


4. Backprop-to-Image/Deconvolution Methods

  • Deconvolution/Backprop-to-image:

    • Techniques to project feature activations back to input space to show which image regions activate specific neurons.

    • Useful for understanding spatial sensitivity and layer-wise feature abstraction.

  • Guided Backpropagation: Refines visualization by masking negative gradients, creating sharper images.


5. Deep Dream and Hallucination

  • Deep Dream: Algorithm modifies input images to maximize neuron activations, creating "dream-like" visuals.

  • Hallucination: Technique to amplify patterns seen by CNNs, exploring their internal feature space interactively.


6. Neural Style Transfer

  • Combines content features from one image with style features from another (usually using a pre-trained CNN) to create art-like images.

  • Relies on separating and recombining content and style representations in feature maps.


7. Class Activation Mapping (CAM, Grad-CAM)

  • CAM: Generates heatmaps showing which image regions the CNN used for a given class prediction.

  • Grad-CAM: Uses gradients of class scores w.r.t feature maps to create class-discriminative localization maps, useful for model interpretability and debugging.


8. Applications in Vision

  • CNN visualizations help in model interpretation, architecture debugging, medical imaging, and trustworthy AI deployment.

  • Used extensively for explaining decisions in safety-critical or regulated domains.