AL3502 CIAT 1 Answer Key
PART A
1. Define convolution in image processing.
Convolution is a mathematical operation where a filter (kernel) slides over an image to compute a weighted sum of the pixel values beneath it, producing a transformed output useful for tasks such as blurring, sharpening, and edge detection.
2. What is the purpose of the Hough Transform?
The Hough Transform detects geometric shapes like lines or circles in images by transforming points in image space into parameters in Hough space, accumulating votes, and identifying peaks corresponding to shapes.
3. Differentiate between edge and corner features in images.
An edge is a boundary where the intensity changes sharply, representing object outlines; a corner is a point where two or more edges meet, representing locations of maximum curvature.
4. Write any two differences between Sigmoid and ReLU activation functions.
- Sigmoid outputs values between 0 and 1; ReLU outputs values from 0 to infinity for positive input, and 0 otherwise.
- Sigmoid can cause vanishing gradients for large inputs; ReLU mitigates vanishing gradients for positive values.
5. State the vanishing gradient problem in deep networks.
The vanishing gradient problem occurs when gradients in deep networks become extremely small during backpropagation, causing the early layers to learn very slowly or not at all.
6. What is dropout regularization in neural networks?
Dropout randomly disables a subset of neurons during training, forcing the network to learn redundant representations and reducing overfitting.
7. What is the role of Class Activation Mapping (CAM) in CNNs?
CAM highlights image regions most relevant to a particular class prediction, providing interpretability by showing which parts influenced the model's decision.
PART B
8a. Explain in detail the process of image formation, capture, and representation in computer vision.
Image formation starts with scene light reflected onto a sensor (camera), producing a digital image as a matrix of pixel values. Image capture involves sampling this signal with an analog-to-digital converter, recording brightness and color for each pixel. Image representation stores this as arrays (grayscale, RGB, multispectral) for further processing.
OR
8b. Describe feature detection methods such as edges, blobs, and corners. How are they useful in computer vision applications?
Edges are detected via operators like Sobel or Canny; blobs through Laplacian of Gaussian or Difference of Gaussians; corners via Harris or Shi-Tomasi methods. These features help in segmentation, matching, tracking, and object detection by providing distinctive and repeatable reference points.
9a. Explain the working of gradient descent and backpropagation with neat diagrams and mathematical expressions.
Gradient descent iteratively updates model weights using the negative gradient of the loss function. Backpropagation computes these gradients layer by layer using the chain rule, adjusting weights to minimize prediction error:
$$
w_{new} = w_{old} - \alpha \frac{\partial L}{\partial w}
$$
where $$ L $$ is the loss, $$ w $$ the weight, and $$ \alpha $$ the learning rate.
OR
9b. Discuss the vanishing gradient problem in deep neural networks. What techniques are used to overcome this issue?
The vanishing gradient problem arises due to repeated multiplications of small derivatives, shrinking gradient flow to early layers. Mitigation strategies include using ReLU activation, batch normalization, residual networks (skip connections), improved initializations, and normalized architectures.
10a. What is RANSAC? Explain its working principle with an example, and compare it with the Hough Transform for robust estimation.
RANSAC is a robust method for fitting models to data with outliers. It iteratively selects random subsets, fits a model, and counts inliers; the model with the most inliers is chosen. Compared to Hough Transform, RANSAC is more flexible for arbitrary model fitting but Hough is more efficient for shapes like lines/circles.
OR
10b. Trace the evolution of CNN architectures AlexNet, ZFNet, VGG. Highlight their contributions to deep learning for vision.
AlexNet popularized deep CNNs with GPUs and ReLU; ZFNet improved interpretability and feature visualization; VGG showed the value of deep but simple architectures with small filters, enhancing image classification accuracy.
PART C
11a. Describe different regularization techniques in deep learning (dropout, adversarial training, weight decay) with suitable examples.
- Dropout: Randomly disables units during training (e.g., setting dropout=0.5 in fully connected layers).
- Adversarial training: Trains on inputs perturbed to fool the model, strengthening robustness.
- Weight decay: Adds a penalty ($$\lambda||w||^2$$) to the loss, discouraging large weights and improving generalization.
OR
11b. Explain how visualization techniques such as kernel visualization, deconvolution, and Grad-CAM help us understand CNNs.
Kernel (filter) visualization reveals what features are learned; deconvolution projects activations back to image space to show receptive fields; Grad-CAM produces heatmaps over input highlighting regions contributing to predictions, enhancing model interpretability.
Join the conversation