> For the complete documentation index, see [llms.txt](https://notes.adarshdubey.com/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://notes.adarshdubey.com/deep-learning/topic-wise-notes/back-propagation/what-is-back-propagation.md).

# What is back propagation?

## Overview

> In machine learning, back-propagation is a gradient estimation method used to train neural network models. The gradient estimate is used by the [optimization algorithm](https://en.wikipedia.org/wiki/Optimization_algorithm) to compute the network [parameter](https://en.wikipedia.org/wiki/Parameter_\(machine_learning\)) updates.
>
> *- Wikipedia*

Basically, used to train a neural network.

<img src="/files/ByxB9WyyTmcIKZt3SQNd" alt="Simple neural network" class="gitbook-drawing">

*We'll be referencing the above diagram throughout the notes*

So in above neural network, each arrow represents a weight and each perceptron has a bias. So basically, <mark style="color:yellow;">back propagation algorithm is used to find best values of all the weights and biases so that the neural network performs as good as it can</mark>.&#x20;

For this setting, we'll assume all the activation functions to be *linear*.

## Steps in back propagation

<table data-full-width="false"><thead><tr><th>IQ</th><th>CGPA</th><th>Package (LPA)</th></tr></thead><tbody><tr><td>80</td><td>8</td><td>5</td></tr><tr><td>60</td><td>5</td><td>3</td></tr><tr><td>110</td><td>7</td><td>8</td></tr></tbody></table>

Consider the above table as an example

1. Initialize all the weights and biases\
   Either make them random\
   Or initialize all the weights to be 1 and biases to be 0 (we'll do this)

2. Select a random student (row)\
   We're selecting the first row, whose package is 5

3. Predict the Package (target column) using dot product\
   As the initial weights are very random, the result will be incorrect\
   Suppose the answer we got is 17

4. Choose a loss function\
   We're choosing MSE[^1] -> $$L = (y-\hat{y})^2$$\
   In our case, the answer will be $$L = (5 - 17)^2 = 144$$

5. Update the weights and biases using gradient descent using the following formula:

$$
w\_{new} = w\_{old}  -  \eta\frac{{\partial L}}{{\partial w\_{old}}}
$$

$$
b\_{new} = b\_{old}  -  \eta\frac{{\partial L}}{{\partial b\_{old}}}
$$

&#x20;        Where $$\eta$$ = Learning rate

## Hierarchy

So now the final output $$\hat{y}$$ or $$O\_{21}$$ depends on numerous variables.\
$$O\_{21} = w^2\_{11} \cdot O\_{11} + w^2\_{21} \cdot O\_{12} + b\_{21}$$

Here, $$O\_{21}$$ depends upon 5 variables, i.e. $$w^2\_{11}, w^2\_{21}, O\_{11}, O\_{12}, b\_{21}$$. Now the weights $$w^2\_{11}$$ & $$w^2\_{21}$$ and bias $$b\_{21}$$ are simple variables, however, the results $$O\_{11}$$ & $$O\_{12}$$ depend on their weights and biases. In this way, a hierarchy is formed.

During the whole algorithm, we select a random student (row) and then we calculate it's loss and using the loss we update the weights and biases.

## Calculation

### Overview

Suppose we have to calculate and update the weight $$w^2\_{11}$$, we will have to perform the following calculations:

$$
w^2\_{11new} = w^2\_{11old} - \eta \frac{\partial L}{\partial w^2\_{11}}
$$

Now as we already have the value of $$w^2\_{11old}$$ & $$\eta$$, we only have to calculate the value of $$\frac{\partial L}{\partial w^2\_{11}}$$. This value is also known as the gradient and this is why it's known as that. Now in the algorithm, each weight and bias is updated, so basically in our neural network, we'll calculate such gradient 9 times as we have 6 weights and 3 biases.

### Intuition

Now what does $$\frac{\partial L}{\partial w^2\_{11}}$$ actually mean? This term signifies the *amount* of change occurs in $$L$$ when we change $$w^2\_{11}$$.&#x20;

But you might notice, $$L$$ doesn't directly depends upon $$w^2\_{11}$$ but rather depends upon $$\hat{y}$$ and then as we already know, $$\hat{y}$$ directly depends upon $$w^2\_{11}$$. So basically it means that $$L$$ indirectly depends upon $$w^2\_{11}$$ (and that is why we're using $$\partial$$ derivatives and not direct derivatives)

$$
\frac{\partial L}{\partial w^2\_{11}} = \frac{\partial L}{\partial \hat{y}} \cdot \frac{\partial \hat{y}}{\partial w^2\_{11}}
$$

### Raw calculation

$$
\frac{\partial L}{\partial \hat{y}} = \frac{\partial (y - \hat{y})^2}{\partial \hat{y}} = -2(y - \hat{y})
$$

$$
\frac{\partial \hat{y}}{\partial w^2\_{11}} = \frac{\partial (O\_{11}\cdot w^2\_{11} + O\_{12}\cdot w^2\_{21} + b\_{21})}{w^2\_{11}}= O\_{11}
$$

$$
\frac{\partial L}{\partial w^2\_{11}} = \frac{\partial L}{\partial \hat{y}} \cdot \frac{\partial \hat{y}}{\partial w^2\_{11}}=-2(y-\hat{y})\cdot O\_{11}
$$

## Summary

So basically, update all the weights and biases for each row and after we're done with the first iteration, the weights and biases will predict answer a little better than previous turn.

However, it won't be as correct as we want, so for that, we'll let this happen numerous times on our dataset for it to get better each time.

Number of times the algorithm will go through the whole dataset is known as epoch. With each epoch, the loss will be reduced and the model gets better with time.

[^1]: Mean squared error


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://notes.adarshdubey.com/deep-learning/topic-wise-notes/back-propagation/what-is-back-propagation.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
