Rectified Linear Unit (ReLU)

1. Introduction to ReLU

ReLU (Rectified Linear Unit) is a widely used activation function in deep neural networks.
It outputs the input directly if positive; otherwise, it outputs zero.

Mathematically:

f(x) = \max(0, x)

which means:

f(x) = \begin{cases} x & \text{if } x > 0 \\ 0 & \text{if } x \leq 0 \end{cases}

This simple non-linearity helps deep models learn complex patterns with efficient gradient flow.

2. Why ReLU is Popular?

Computationally efficient: ReLU involves simple thresholding, making it fast to compute.
Sparse activation: Outputs zero for negative inputs, leading to sparse representations which improve model efficiency and reduce overfitting.
Mitigates vanishing gradient problem: Unlike sigmoid/tanh, the gradient of ReLU is 1 for positive inputs, preserving gradient strength during backpropagation in deep models.
Promotes faster convergence: Helps models train more quickly.

3. Derivative of ReLU

The derivative used in backpropagation is:

f'(x) = \begin{cases} 1 & \text{if } x > 0 \\ 0 & \text{if } x \leq 0 \end{cases}

This piecewise derivative makes gradient computations straightforward.

4. Drawbacks

Dying ReLU problem: Some neurons can become inactive (output always zero) if input is always negative.
Not zero-centered: Outputs zero or positive, which can affect weight updates in some cases.
Unbounded output: Large values can cause exploding activations if not controlled.

5. Variants of ReLU

5.1 Leaky ReLU

f(x) = \begin{cases} x & x > 0 \\ \alpha x & x \leq 0 \end{cases}

Allows a small, non-zero gradient $\alpha$ for negative inputs, reducing the dying ReLU problem.

5.2 Parametric ReLU (PReLU)

Similar to Leaky ReLU, but $\alpha$ is learned during training instead of fixed.

5.3 Exponential Linear Unit (ELU)

Smoothes negative part to reduce bias shift and improve learning speed.

6. ReLU vs Other Activation Functions

Activation Function	Output Range	Pros	Cons	Typical Use
ReLU	[0, ∞)	Simple, sparse, mitigates vanishing gradient	Dying neurons, unbounded output	Hidden layers in deep nets
Sigmoid	(0, 1)	Smooth, probabilistic interpretation	Vanishing gradient, slow training	Output layers in binary classification
Tanh	(-1, 1)	Zero-centered	Vanishing gradient	Hidden layers when zero-centered output needed
Leaky ReLU	(-∞, ∞)	Fixes dying ReLU	Needs tuning $\alpha$	Hidden layers alternative to ReLU
ELU	(-α, ∞)	Smooth negative values, faster convergence	More computation	Deeper nets for improved learning

7. Applications

Used as default activation function in convolutional neural networks (CNNs) for image classification, object detection, segmentation.
Useful in deep feedforward networks and reinforcement learning.
Helps build deeper, more expressive models while retaining trainability.

8. Summary

ReLU's simplicity and effectiveness have made it the go-to activation function in deep learning.
While it has drawbacks, simple variants like Leaky ReLU and parametric forms address them.
Understanding ReLU is critical to grasping modern deep learning model design and training dynamics.

Shanlaksh

Rectified Linear Unit (ReLU)

1. Introduction to ReLU

2. Why ReLU is Popular?

3. Derivative of ReLU

4. Drawbacks

5. Variants of ReLU

5.1 Leaky ReLU

5.2 Parametric ReLU (PReLU)

5.3 Exponential Linear Unit (ELU)

6. ReLU vs Other Activation Functions

7. Applications

8. Summary

Syllabus for UI and UX Design (CCS370)

UI and UX Unit 1 Important question and answer

UI vs. UX Design

AL3502: Deep Learning for Vision

Introduction to Image Formation

Rectified Linear Unit (ReLU)

1. Introduction to ReLU

2. Why ReLU is Popular?

3. Derivative of ReLU

4. Drawbacks

5. Variants of ReLU

5.1 Leaky ReLU

5.2 Parametric ReLU (PReLU)

5.3 Exponential Linear Unit (ELU)

6. ReLU vs Other Activation Functions

7. Applications

8. Summary

Join the conversation