Unit 3: Visualization and Understanding CNN
1. Introduction to CNNs
-
Convolutional Neural Networks (CNNs) are deep learning models particularly effective for image data, exploiting spatial hierarchies for feature learning.
-
Main components: Convolutional layers, Activation functions (e.g., ReLU), Pooling layers, and Fully-connected layers.
2. Evolution of CNN Architectures
Classic CNN Architectures:
-
AlexNet: First deep CNN to win ImageNet (2012), used ReLU, dropout, and GPUs for training.
-
ZFNet (Zeiler & Fergus Net): Improved on AlexNet, visualized intermediate activations to better understand learned features.
-
VGGNet: Very deep nets with small (3x3) convolutions, emphasized architectural simplicity, improved performance.
3. Visualization of Kernels/Filters
-
Kernel visualization helps interpret what features the network’s convolutional layers are learning.
-
Filters in early layers visualize as edge or color detectors, while deeper layers capture more complex patterns (textures, objects).
4. Backprop-to-Image/Deconvolution Methods
-
Deconvolution/Backprop-to-image:
-
Techniques to project feature activations back to input space to show which image regions activate specific neurons.
-
Useful for understanding spatial sensitivity and layer-wise feature abstraction.
-
-
Guided Backpropagation: Refines visualization by masking negative gradients, creating sharper images.
5. Deep Dream and Hallucination
-
Deep Dream: Algorithm modifies input images to maximize neuron activations, creating "dream-like" visuals.
-
Hallucination: Technique to amplify patterns seen by CNNs, exploring their internal feature space interactively.
6. Neural Style Transfer
-
Combines content features from one image with style features from another (usually using a pre-trained CNN) to create art-like images.
-
Relies on separating and recombining content and style representations in feature maps.
7. Class Activation Mapping (CAM, Grad-CAM)
-
CAM: Generates heatmaps showing which image regions the CNN used for a given class prediction.
-
Grad-CAM: Uses gradients of class scores w.r.t feature maps to create class-discriminative localization maps, useful for model interpretability and debugging.
8. Applications in Vision
-
CNN visualizations help in model interpretation, architecture debugging, medical imaging, and trustworthy AI deployment.
-
Used extensively for explaining decisions in safety-critical or regulated domains.
Join the conversation