Gradient Descent

Gradient descent is a fundamental optimization technique widely used in deep learning and machine learning to minimize functions, typically the loss or error function, by iteratively moving towards the minimum value.

1. Concept and Principle

Gradient descent solves optimization problems of the form:
$\min_{w} f(w)$
where $f(w)$ is a differentiable function (often a loss or cost function in machine learning).
The algorithm proceeds by moving in the direction of the negative gradient, which points towards the fastest local decrease of the function.

2. Update Rule

At each iteration, parameters are updated as:
$w_{\text{next}} = w_{\text{current}} - \alpha \nabla f(w_{\text{current}})$
where:
- $w$ = parameter vector
- $\alpha$ = learning rate (step size)
- $\nabla f(w)$ = gradient of $f$ at $w$
If the gradient is zero, the update stops and the minimum (local or global) is found.

3. Algorithm Steps

Initialize the parameters randomly.
Compute the gradient $\nabla f(w)$ of the loss with respect to the parameters.
Update the parameters using the rule above.
Repeat steps 2-3 until convergence or for a set number of iterations.

Pseudo-code


python
initialize w
for i in range(num_iterations):
    grad = gradient_of_loss_function(w)
    w = w - learning_rate * grad
return w

4. Variants

Batch Gradient Descent: The gradient is computed using the whole dataset at each iteration.
Stochastic Gradient Descent (SGD): The gradient is computed using one randomly chosen example at a time.
Mini-batch Gradient Descent: The gradient is computed using a small subset (batch) of the data per iteration.

5. Example

To minimize the quadratic function:

Q(x, y) = (x-2)^2 + (y-3)^2

The gradient is:

\nabla Q(x, y) = [2(x-2),\ 2(y-3)]

The update steps are:

x \leftarrow x - \alpha \cdot 2(x-2)

y \leftarrow y - \alpha \cdot 2(y-3)

By repeating the above updates, $(x, y)$ converge to $(2, 3)$ .

6. Applications

Training machine learning models: Linear regression, logistic regression, neural networks.
Minimizing error/loss: Fitting the best solution to the data.

7. Choosing Learning Rate

A too-small learning rate leads to slow convergence.
A too-large learning rate may cause divergence or oscillations.

8. Summary

Gradient descent iteratively moves towards the minimum by following the negative gradient.
It forms the backbone of model training in deep learning and machine learning.

Shanlaksh

Gradient Descent

1. Concept and Principle

2. Update Rule

3. Algorithm Steps

Pseudo-code

4. Variants

5. Example

6. Applications

7. Choosing Learning Rate

8. Summary

Syllabus for UI and UX Design (CCS370)

UI and UX Unit 1 Important question and answer

UI vs. UX Design

AL3502: Deep Learning for Vision

Introduction to Image Formation

Gradient Descent

1. Concept and Principle

2. Update Rule

3. Algorithm Steps

Pseudo-code

4. Variants

5. Example

6. Applications

7. Choosing Learning Rate

8. Summary

Join the conversation