Share on facebook
Share on twitter
Share on linkedin
Share on pinterest

Introduction to Logistic Regression

In this article, we’ll discuss a supervised machine learning algorithm logistic regression.

Logistic regression is a probabilistic classifier. It outputs the probability of a point belonging to a specific class. The output lies between [0,1].

Logistic regression is similar to linear regression; however, the difference is that linear regression can only be used to model continuous variables and cannot be used when the response variable is dichotomous – for example, whether a customer will churn or not or whether a tumor is malignant or benign.

In such cases, rather than predicting an output directly, logistic regression outputs a probability whether a point belongs to a certain class.

Now let’s see an example of why we can’t use linear regression for the dichotomous response variable. Consider the below plot

dichotomous logistic regression

The response variable here is dichotomous. It takes only two values, either 0 or 1.

Now if we apply linear regression to the above data, it will look like the following.

The linear function is not appropriate, as expected in the case of dichotomous values, for representing the relation between the independent variable and the dependent variable.

The main problem in fitting a linear function for dichotomous values is that the values do not always fall in the anticipated range.

If we take a look at the fitted line, we can see that in some cases the predicted result is a negative value and in some cases as the predictor increases, it is higher than one.

Logistic regression uses an S-shaped(sigmoidal) curve to solve this problem.


logistic regression

As seen, the S-shaped curve can illustrate binary logistic probabilities much better.

We can see from the above plot that the predicted value approaches one if the predictor goes towards ∞. Similarly, the predicted value approaches 0 if the predictor goes towards -∞.

The horizontal line at 0.5 is the threshold. Anything higher than this value will be considered as class 1, and anything below this value will be considered as class 0.

Probability, Odds and Log-Odds:

Logistic regression is based on concepts like probability and odds, so before proceeding further, let’s first discuss them.


Probability is defined as the outcomes of interest divided by all the possible outcomes.

P=\frac{\text { outcomes of interest }}{\text { all possible outcomes }}

For example, let’s say we flip a fair coin. The probability of it being heads is 0.5

P(\text {heads})=\frac{1}{2}=0.5


Odds defined as the probability of something happening divided by the probability of it not happening.

o d d s=\frac{P}{1-P}

For example, the odds of landing a head when you flip a coin are 1.

o d d s(\text {heads})=\frac{0.5}{0.5}=1


If we apply natural logarithm to the odds then we’ll get the log-odds.

LogOdds=\ln \left(\frac{P}{1-P}\right)

Logistic Regression:

If you recall in linear regression, the predicted output is given by,

\mathrm{Y}=\mathrm{a}+\mathrm{b}^{*} \mathrm{x}

Where x is the data, y is the response variable, b is the coefficient

Similarly, for logistic regression, we can write,

\mathrm{P}=\mathrm{a}+\mathrm{b}^{*} \mathrm{x}

Where P is the probability that it belongs to a specific class.

In linear regression,Y ranges from -∞ to +∞ and the X on the right side also lives in the range -∞ to +∞.

However, the problem with the logistic regression equation is that the probability P on the left-hand side ranges between [0,1] and the covariates on the right-hand side can take any real number.

To make the ranges on both the sides equal, we transform the probability to an odds.

o d d s=\frac{P}{1-P}

o d d s=\frac{P}{1-P}=\mathrm{a}+\mathrm{b}^{*} \mathrm{x}

Like probability, odds have a lower bound. But there is no upper bound. The range of the odds is from 0 to ∞.

To remove the floor restrictions we take logarithms of the odds which live in the range -∞ to +∞.

Now the equation becomes

\ln \left(\frac{P}{1-P}\right)=\mathrm{a}+\mathrm{b}^{*} \mathrm{x}

The log(odds) is called the logit function.

Now we can solve the logit equation for P to obtain,

\frac{P}{1-P}=e^{a+b \cdot X}

P=\frac{e^{a+b \cdot X}}{1+e^{a+b * X}}

P=\frac{1}{1+e^{-(a+b * X)}}

This equation will always give a value between 0 and 1.

We can expand this equation for multiple variables.

P=\frac{1}{1+e^{-\left(a+b_{1}^{*} X_{1}+b_{2} * X_{2}+b_{3} * X_{3}+\ldots+b n^{*} X n\right)}}

P=\frac{1}{1+e^{-\left(a+b^{*} X_{i}\right)}}

Now we need to estimate the parameters a and b. The most common method for estimation of b is the Maximum Likelihood Estimation.

Let’s formulate the likelihood function

The probability that y = 1 given x is denoted by P(x).

\operatorname{Pr}(\mathrm{y}=1 | \mathrm{x})=\mathrm{P}(\mathrm{x})

Similarly, the probability that y=0 given x is denoted by 1 – P(x)

\operatorname{Pr}(\mathrm{y}=0 | \mathrm{x})=\mathrm1 -{P}(\mathrm{x})

Combining these two we’ll get

\operatorname{Pr}(\mathrm{y} | \mathrm{x})=\mathrm{P}(\mathrm{x})^{\mathrm{y}}*(1-\mathrm{P}(\mathrm{x}))^{1-\mathrm{y}}

As the observations are assumed to be independent, the likelihood function is obtained as the product of the terms

L=\prod_{i=1}^{N} P\left(x_{i}\right)^{y_i} \prod_{i=1, y_{i}=0}^{N}\left(1-P\left(x_{i}\right)\right)^{1 - y_i}

Now we’ll take the log of this likelihood function so it is easier to work with.

L=\sum_{i=1}^{N} y_i\log P\left(x_{i}\right)+(1-y_i) \log \left(1-P\left(x_{i}\right)\right)

This is the log-likelihood function for logistic regression. To estimate the parameters, we need to maximize the log-likelihood.

We can use the Newton-Raphson method to find the Maximum Likelihood Estimation.

Now that we have covered what logistic regression is let’s do some coding.

We’ll apply logistic regression on the breast cancer data set.


Love What you Read. Subscribe to our Newsletter.

Stay up to date! We’ll send the content straight to your inbox, once a week. We promise not to spam you.

Subscribe Now! We'll keep you updated.