How does it work?

In this part, we'll write our own back propagation algorithm to train 2 different neural networks or 2 different datasets.

The video I followed while making this part is here: https://youtu.be/ma6hWrU-LaI?si=cgahIMbJrmiUl505

Regression problem

Dataset & neural network

We'll consider the following dataset and we'll train a model by writing our own algorithm.

CGPAResume scorePackage

8

8

4

7

9

5

6

10

6

5

12

7

Below is our simple neural network and we'll be training that only.

Algorithm

  1. for i in range(epochs):

    1. for j in range(X.shape[0]):

      1. Select random row

      2. Predict using forward propagation

      3. Calculate loss (loss function is )

      4. Update weights and biases using GD

  2. Calculate average loss for each epoch

Formula

Now in the last part we derived the formula for finding each weight and bias

wn=woηLwow_{n} = w_{o} - \eta\frac{{\partial L}}{{\partial w_{o}}}

According to the above neural network, here are all the formulas:

Lw112=2(yy^)O11\frac{\partial L}{\partial w^2_{11}}=-2(y-\hat{y})\cdot O_{11}
Lw212=2(yy^)O12\frac{\partial L}{\partial w^2_{21}}=-2(y-\hat{y})\cdot O_{12}
Lb21=2(yy^)\frac{\partial L}{\partial b_{21}}=-2(y-\hat{y})
Lw111=2(yy^)w112x11\frac{\partial L}{\partial w^1_{11}}=-2(y-\hat{y})\cdot w^2_{11} \cdot x_{11}
Lw211=2(yy^)w112x12\frac{\partial L}{\partial w^1_{21}}=-2(y-\hat{y})\cdot w^2_{11} \cdot x_{12}
Lb11=2(yy^)w112\frac{\partial L}{\partial b_{11}}=-2(y-\hat{y})\cdot w^2_{11}
Lw121=2(yy^)w212x11\frac{\partial L}{\partial w^1_{12}}=-2(y-\hat{y})\cdot w^2_{21} \cdot x_{11}
Lw221=2(yy^)w212x12\frac{\partial L}{\partial w^1_{22}}=-2(y-\hat{y})\cdot w^2_{21} \cdot x_{12}
Lb12=2(yy^)w212\frac{\partial L}{\partial b_{12}}=-2(y-\hat{y})\cdot w^2_{21}

Ufff... there were quite some formulas here, but it's important to understand how they were formed. If you believe you already know how they were derived, feel free to skip it. You don't need to learn it!

Implementation

Now we'll implement this in jupyter notebook. I have done this in a google colab and here's the link of the colab:

As you can see, we have implemented an algorithm for traning deep learning neural network in python itself. Let's move on to the classification problem.


Classification problem

Dataset and neural network

For classification problem we're going to consider a similar dataset.

CGPAResume ScoreIs palced?
8
8
1
7
9
1
6
10
0
5
5
0

But this time, there are two differences:

  1. The activation function we're going to use is Sigmoid function

  2. Rather than using the MSE loss function, this time we're going to use

LOSS=ylog(y^)(1y)log(1y^)LOSS = -y\cdot\log{(\hat{y})}-(1-y)\cdot \log{(1-\hat{y})}

Algorithm

The algorithm for back propagation in classification problem is same as Algorithm in Regression problem

Formula

Now for the regression problem, we already derived it's formula in the What is back propagation?, but we didn't do that for classification problem.

However, you can follow similar approach to derive all the formulas for classification problem using the Binary Cross Entropy loss function.

If you want to see how these are dervied, I stronly suggest you checking out the video.

For now, I'll simply note them down:

Lw112=(yy^)O11\frac{\partial L}{\partial w^2_{11}} = -(y-\hat{y})\cdot O_{11}
Lw212=(yy^)O12\frac{\partial L}{\partial w^2_{21}} = -(y-\hat{y})\cdot O_{12}
Lb21=(yy^)\frac{\partial L}{\partial b_{21}} = -(y-\hat{y})
Lw112=(yy^)w112O11(1O11)xi1\frac{\partial L}{\partial w^2_{11}} = -(y-\hat{y})\cdot w^2_{11}\cdot O_{11}\cdot(1-O_{11})\cdot x_{i1}
Lw212=(yy^)w112O11(1O11)xi2\frac{\partial L}{\partial w^2_{21}} = -(y-\hat{y})\cdot w^2_{11}\cdot O_{11}\cdot(1-O_{11})\cdot x_{i2}
Lb11=(yy^)w112O11(1O11)\frac{\partial L}{\partial b_{11}} = -(y-\hat{y})\cdot w^2_{11}\cdot O_{11}\cdot(1-O_{11})
Lw121=(yy^)w212O12(1O12)xi1\frac{\partial L}{\partial w^1_{12}} = -(y-\hat{y})\cdot w^2_{21}\cdot O_{12}\cdot(1-O_{12})\cdot x_{i1}
Lw221=(yy^)w212O12(1O12)xi2\frac{\partial L}{\partial w^1_{22}} = -(y-\hat{y})\cdot w^2_{21}\cdot O_{12}\cdot(1-O_{12})\cdot x_{i2}
Lw122=(yy^)w212O12(1O12)\frac{\partial L}{\partial w^2_{12}} = -(y-\hat{y})\cdot w^2_{21}\cdot O_{12}\cdot(1-O_{12})

Implementation

I have done the implementation in another google colab. Here's the link:

You might think that we didn't , but we got similar results as Keras. Therefore we can conclude that we have successfully implemented backpropagation for a classification problem with Cross Binary Entropy as our loss function.


That's it for back propagation algorithm. We have seen back propagation in detail. Now if you have to developer a better understanding (intuitively) about back-propagation, I recommend you watching CampusX's "The Why of Back propagation" video.

Otherwise, if you feel confident in this topic, feel free to move on to the next one. Thanks a lot for reading! I'll really appreciate if you have any feedback.

Byee folks 👋

Last updated