Saturday, May 16, 2020

Linear Regression Complete Derivation (3/5)

Spongebob Characters - Welcome to Spongebob's Fan Club!

In the last article we saw how can find the regression line using brute force. But that is not that fruitful for our data which is usually in millions.So to tackle such datasets we use python libraries, but such libraries are built on some logical theories,right? So let’s find out the logic behind some creepy looking formulas. Believe me the math behind it is sexier!

Before we begin, the knowledge of following topics might be helpful!

  • Partial Derivatives
  • Summations

Are you excited to find the line of best fit?


Let’s start by defining a few things

1) Given n inputs and outputs.

2) We define the line of best fit as…

3) Now we need to minimize the error function we named S…

4) Put the value of equation 2 into equation 3.

To minimize our error function, S, we must find where the first derivative of S is equal to 0 with respect to a and b. The closer a and b are to 0, the less total error for each point is. Let’s find partial derivative of a first.

Finding a :

1 ) Find the derivative of S with respect to a..

2 ) Using chain rule.. Let’s say ..

3) Using partial derivative..

4) Expanding …

5) Simplifying…

6) To find extreme values we put it to zero…

7) Dividing the left side with -2…..

8) Now let’s break the summation in 3 parts..

9) Now the summation of a will be an….

10) Substituting it back in the equation…

11) Now we need to solve for a..

12) The summation of Y and x divided by n, is simply it’s mean..



We’ve minimized the cost function with respect to x. Now let’s find the last part which S with respect to b.


Finding B :

1 ) Same as we done with a..

2) Finding the partial derivative…

3) Expanding it a bit..

4) Putting it back in the equation..

5) Let’s divide by -2 both sides..

6) Let’s distribute x for ease of viewing …

Now let’s do something fun!! Remember we found the value of a earlier in this article? Why don’t we substitute it? Well, let’s see what happens!!

7) Substituting value of a…

8) Let’s distribute the minus sign and x…

Well, you don’t like it? Let’s split up the sum into two sums…

9) Splitting the sum..

10) Simplifying…

11) Finding B from it..

Great!! We did it!! We have isolated a and b in form of x and y. It wasn’t that hard, was it?

Still have some energy and want to explore it a bit!

12 ) Simplifying the formula…

13) Multiplying numerator and denominator by n in equation 11…

14) Now if we simplify the value of a using equation 13 we get…

Summing it up :)

If you have a dataset with one independent variable, you can find the line that best fits by calculating B.

Then substituting B into a…

And finally substituting B and a into line of best fit…

Moving Onwards,

In next article we’ll see about how we can implement simple linear regression from scratch (without sklearn) in python.

And please let me know whether you liked this article or not! I bet you liked it!!

You can download the code and some handwritten notes on the derivation from here : https://drive.google.com/open?id=1_stSoY4JaKjiSZqDdVyW8VupATdcVr67

If you have any additional questions, feel free to contact me : shuklapratik22@gmail.com

No comments:

Post a Comment