AL3502: Deep Learning for Vision

 


Syllabus

Course Objectives

  • Introduce basic computer vision concepts.

  • Understand methods and terminologies used in deep neural networks.

  • Impart knowledge on Convolutional Neural Networks (CNNs).

  • Introduce Recurrent Neural Networks (RNNs) and Deep Generative Models.

  • Apply deep learning to real-world computer vision applications.

Unit I: Computer Vision Basics

Unit II: Introduction to Deep Learning

  • Deep Feed-Forward Neural Networks

  • Gradient Descent

  • Back-Propagation and Other Differentiation Algorithms

  • Vanishing Gradient Problem and Mitigation Strategies

  • Rectified Linear Unit (ReLU)

  • Heuristics for Avoiding Bad Local Minima

  • Heuristics for Faster Training

  • Nesterov Accelerated Gradient Descent

  • Regularization for Deep Learning: Dropout, Adversarial Training

  • Optimization for Training Deep Models

Unit III: Visualization and Understanding CNN

  • Convolutional Neural Networks (CNNs): Introduction to CNNs

  • Evolution of CNN Architectures: AlexNet, ZFNet, VGG

  • Visualization of Kernels

  • Backprop-to-Image/Deconvolution Methods

  • Deep Dream, Hallucination, Neural Style Transfer

  • Class Activation Mapping (CAM), Grad-CAM

Unit IV: CNN and RNN for Image and Video Processing

  • CNNs for Recognition, Verification, Detection, Segmentation

    • CNNs for Recognition and Verification (Siamese Networks, Triplet Loss, Contrastive Loss, Ranking Loss)

    • CNNs for Detection: Object Detection Background, R-CNN, Fast R-CNN

    • CNNs for Segmentation: FCN, SegNet

  • Recurrent Neural Networks (RNNs): Review of RNNs

  • CNN + RNN Models for Video Understanding: Spatio-Temporal Models, Action/Activity Recognition

Unit V: Deep Generative Models

  • Review of Popular Deep Generative Models: GANs, VAEs

  • Variants and Applications of Generative Models in Vision

    • Applications: Image Editing, Inpainting, Superresolution, 3D Object Generation, Security

  • Recent Trends: Self-Supervised Learning, Reinforcement Learning in Vision

Practical Exercises (30 Periods)

  1. Implement basic image processing operations (Feature Representation and Extraction)

  2. Implement a simple neural network

  3. Study pretrained deep neural network models for images

  4. Apply CNN for image classification

  5. Apply CNN for image segmentation

  6. Apply RNN for video processing

  7. Implement deep generative models for image editing

Course Outcomes

Upon successful completion, students will be able to:

  • Implement basic image processing operations

  • Understand deep learning concepts

  • Design and implement CNN, RNN, and deep generative models

  • Understand the role of deep learning in computer vision

  • Design and implement deep generative models

Textbooks

  • Ian Goodfellow, Yoshua Bengio, Aaron Courville, “Deep Learning”, MIT Press, 2017

  • Ragav Venkatesan, Baoxin Li, “Convolutional Neural Networks in Visual Computing”, CRC Press, 2018

References

  • Rajalingappaa Shanmugamani, Deep Learning for Computer Vision, Packt Publishing, 2018

  • David Forsyth, Jean Ponce, Computer Vision: A Modern Approach, 2002

  • Modern Computer Vision with PyTorch, V. Kishore Ayyadevara, Yeshwanth Reddy, Packt Publishing Ltd, 2020

  • Richard Szeliski, Computer Vision: Algorithms and Applications, 2010

  • Simon Prince, Computer Vision: Models, Learning, and Inference, 2012

  • NPTEL Course Materials

Structure

  • Lectures: 45 Periods

  • Practical Exercises: 30 Periods

  • Total: 75 Periods