Saturday, May 16, 2020

Simple Linear Regression Implementation From Scratch (4/5)

In the last article we derived a formula to calculate the “best fit” regression line. Now it’s time to implement it using python. Keep in mind that in this article we are not going to use python libraries such as “scikit-learn” to find the parameter such as slope and intercept of line, but instead we will implement Simple Linear Regression from basic python code and user-defined functions. Here we’ll utilize the formula we derived in the last article to find slope and intercept. We’ll use python libraries such as “matplotlib” to visualize the data and the regression line, but if you don’t want to visualize the data and just want to find the regression line you can skip the code where we use matplotlib to visualize the data.

Are you excited??

Goal : To predict the CO2-emission of a new car.
(1) Importing the required libraries :
(2) Read the csv file :
There are more columns in our data but due to limited space I can only show a few here.
(3) Find out the columns in our data :
(4) Find additional information about our data :
(5) Print various statistical data of our dataset :
(6) Select useful features from our dataset :
(7) Plot the data with it’s value count :
(8) Plot the data on scatter plot to find out which feature can be used to make the predictions.
Here we can see that we can easily plot a regression line in ENGINE SIZE VS CO2 EMISSION plot.
(9) Now we will divide our dataset into 2 parts. One for training data and another for testing data. We’ll use 80% of the data for training and 20% of data to test our predictions.
(10) Finding the mean of CO2-EMISSION :
(11) Main function to find slope and intercept. Go check out my last article to understand the derivation of formula used here.
(12) Testing our function with basic data :
Voila! It works perfectly!!
(13) Finding the Slope and Intercept for our actual data :
(14) Now that we have our Slope and Intercept with us we can make our regression line :
(15) Plot the regression line to visualize it :
(16) Now we’ll predict the values with our model. But first we need to make a function for that :
(17) Can we predict the engine-size from co2-emission? Of course!! Here’s how to it…
Now it’s time to check how well our model performed in predicting the testing values. There are many methods to calculate the error/accuracy of a model. Here we’ll cover a few of them
(1) Residual_Sum_of_Squares :
(2) R-Squared :
(3) Mean_Absolute_Error (MAE):
(4) Mean_Squared_Error(MSE):
(5) Mean_Absolute_Percentage_Error(MAPE):
(6) Mean_Percentage_Error(MPE):
In summary, in this article we saw how we can implement simple linear regression without scikit-learn. It’s a lot of work right? But wait..!! There is an easy way to perform the same calculations with same output using some python libraries. In the next article we’ll see how we can perform such complex calculations in minutes with scikit-learn.
In my future articles I will try to show which accuracy model is best for different kind of datasets.
*******
You can download the code and some handwritten notes on the derivation from here : https://drive.google.com/open?id=1_stSoY4JaKjiSZqDdVyW8VupATdcVr67
If you have any additional questions, feel free to contact me : shuklapratik22@gmail.com

Linear Regression Complete Derivation (3/5)

Spongebob Characters - Welcome to Spongebob's Fan Club!

In the last article we saw how can find the regression line using brute force. But that is not that fruitful for our data which is usually in millions.So to tackle such datasets we use python libraries, but such libraries are built on some logical theories,right? So let’s find out the logic behind some creepy looking formulas. Believe me the math behind it is sexier!

Before we begin, the knowledge of following topics might be helpful!

Partial Derivatives
Summations

Are you excited to find the line of best fit?

Let’s start by defining a few things

1) Given n inputs and outputs.

2) We define the line of best fit as…

3) Now we need to minimize the error function we named S…

4) Put the value of equation 2 into equation 3.

To minimize our error function, S, we must find where the first derivative of S is equal to 0 with respect to a and b. The closer a and b are to 0, the less total error for each point is. Let’s find partial derivative of a first.