AL3502: Deep Learning for Vision
Syllabus
Course Objectives
-
Introduce basic computer vision concepts.
-
Understand methods and terminologies used in deep neural networks.
-
Impart knowledge on Convolutional Neural Networks (CNNs).
-
Introduce Recurrent Neural Networks (RNNs) and Deep Generative Models.
-
Apply deep learning to real-world computer vision applications.
Unit I: Computer Vision Basics
-
Introduction to Image Formation, Capture, and Representation
-
Visual Features and Representations: Edge, Blobs, Corner Detection
Unit II: Introduction to Deep Learning
-
Deep Feed-Forward Neural Networks
-
Gradient Descent
-
Back-Propagation and Other Differentiation Algorithms
-
Vanishing Gradient Problem and Mitigation Strategies
-
Rectified Linear Unit (ReLU)
-
Heuristics for Avoiding Bad Local Minima
-
Heuristics for Faster Training
-
Nesterov Accelerated Gradient Descent
-
Regularization for Deep Learning: Dropout, Adversarial Training
-
Optimization for Training Deep Models
Unit III: Visualization and Understanding CNN
-
Convolutional Neural Networks (CNNs): Introduction to CNNs
-
Evolution of CNN Architectures: AlexNet, ZFNet, VGG
-
Visualization of Kernels
-
Backprop-to-Image/Deconvolution Methods
-
Deep Dream, Hallucination, Neural Style Transfer
-
Class Activation Mapping (CAM), Grad-CAM
Unit IV: CNN and RNN for Image and Video Processing
-
CNNs for Recognition, Verification, Detection, Segmentation
-
CNNs for Recognition and Verification (Siamese Networks, Triplet Loss, Contrastive Loss, Ranking Loss)
-
CNNs for Detection: Object Detection Background, R-CNN, Fast R-CNN
-
CNNs for Segmentation: FCN, SegNet
-
-
Recurrent Neural Networks (RNNs): Review of RNNs
-
CNN + RNN Models for Video Understanding: Spatio-Temporal Models, Action/Activity Recognition
Unit V: Deep Generative Models
-
Review of Popular Deep Generative Models: GANs, VAEs
-
Variants and Applications of Generative Models in Vision
-
Applications: Image Editing, Inpainting, Superresolution, 3D Object Generation, Security
-
-
Recent Trends: Self-Supervised Learning, Reinforcement Learning in Vision
Practical Exercises (30 Periods)
-
Implement basic image processing operations (Feature Representation and Extraction)
-
Implement a simple neural network
-
Study pretrained deep neural network models for images
-
Apply CNN for image classification
-
Apply CNN for image segmentation
-
Apply RNN for video processing
-
Implement deep generative models for image editing
Course Outcomes
Upon successful completion, students will be able to:
-
Implement basic image processing operations
-
Understand deep learning concepts
-
Design and implement CNN, RNN, and deep generative models
-
Understand the role of deep learning in computer vision
-
Design and implement deep generative models
Textbooks
-
Ian Goodfellow, Yoshua Bengio, Aaron Courville, “Deep Learning”, MIT Press, 2017
-
Ragav Venkatesan, Baoxin Li, “Convolutional Neural Networks in Visual Computing”, CRC Press, 2018
References
-
Rajalingappaa Shanmugamani, Deep Learning for Computer Vision, Packt Publishing, 2018
-
David Forsyth, Jean Ponce, Computer Vision: A Modern Approach, 2002
-
Modern Computer Vision with PyTorch, V. Kishore Ayyadevara, Yeshwanth Reddy, Packt Publishing Ltd, 2020
-
Richard Szeliski, Computer Vision: Algorithms and Applications, 2010
-
Simon Prince, Computer Vision: Models, Learning, and Inference, 2012
-
NPTEL Course Materials
Structure
-
Lectures: 45 Periods
-
Practical Exercises: 30 Periods
-
Total: 75 Periods
Join the conversation