Share on facebook
Share on twitter
Share on linkedin
Share on pinterest

Introduction to Linear Regression

Before we venture into linear regression, let’s first try to understand what regression analysis is?

What is Regression?

Regression is a statistical approach used for predicting real values like the age, weight, salary, for example.

In regression, we have a dependent variable which we want to predict using some independent variables.

The goal of regression is to find the relationship between an independent variable and a dependent variable, which then be used to predict the outcome of an event.

In this post, we’ll see one type of regression technique called linear regression.

LINEAR REGRESSION:

Linear regression is one of the simplest algorithms in machine learning.

The main objective of this algorithm is to find the straight line which best fits the data.

best fit

The best fit line is chosen such that the distance from the line to all the points is minimum.

The linear regression model makes an assumption that the dependent variable is linearly related to the independent variable.

There are two types of linear regression,

Simple linear regression: If we have a single independent variable, then it is called simple linear regression.

Multiple linear regression: If we have more than one independent variable, then it is called multiple linear regression.
In this post, we will concentrate on simple linear regression and implement it from scratch.

SIMPLE LINEAR REGRESSION:

If we have an independent variable x and a dependent variable y, then the linear relationship between both the variables can be given by the equation

Y = b0 + b1*X

Where b0 is the y-intercept and b1 is the slope.

There are many techniques to estimate these parameters. One such technique is Ordinary Least Square, which is the most popular.

Ordinary least square finds the parameters of the equation in such a way that the sum of squared error is minimum.

The squared error is calculated between the actual and predicted values.

sum of squared errors

The sum of squared error is given by,

simple linear regression
where y is the actual value, and ŷ is the predicted value.

We can calculate b0 using the formula.

linear regression

and b1 is defined as,

linear regression

Now, let’s implement the simple linear regression using python

You can download the dataset here.

Let’s take a quick look at the dataset.

dataset

We have to predict the salaries based on the years of experience

Let’s import the data using pandas

Now let us calculate the coefficients b0 and b1

Next, we can use the coefficients to predict the y values

Now we can use this list of predicted values to draw the regression line.

linear regression

The red line represents our model predictions, and green points represent our data.

GOODNESS OF FIT:

There are various regression metrics to measure the goodness of the regression line.

I have written an article discussing several important evaluation metrics for regression. If you want to learn more about these metrics, you can read the article regression evaluation metrics.

In this post, we’ll discuss a metric called R-squared, which is one of the most commonly used regression evaluation metrics.

R-squared measures the variance explained by the model. It ranges from 0 to 1, where 0 indicates that the fit is poor.

If the R-squared value is 0.90, then we can say that the independent variables have explained 90% of the variance in the dependent variable.

It is determined by,

r-squared

where SSE is the sum of squared errors and is given by

sse

and the total sum of squares(SST) is determined as,

sst

SUMMARY:

Finally, to sum up, in this post, we have discussed linear regression and one of its type called simple linear regression.

The equation y = b0 + b1*X gives the linear relationship between an independent and a dependent variable.

We also discovered how to implement simple linear regression using Ordinary Least Squares.

Finally, we use R-squared to assess the goodness of fit of our model.

In the next part, we’ll discuss multiple linear regression an extension of the simple linear regression which enables us to assess the associaton between two or more independent variables.

Love What you Read. Subscribe to our Newsletter.

Stay up to date! We’ll send the content straight to your inbox, once a week. We promise not to spam you.

Leave a Reply

Your email address will not be published. Required fields are marked *

Subscribe Now! We'll keep you updated.