About Logistic Regression

- October 09, 2015

Logistic Regression is one of those non-intuitive terms. For normal people (as opposed to data scientists), logistics is a familiar term. In general, people understand that it means movement of goods. For those with a software engineering background, regression means running previous test cases with new code to check none of them have failed.

Turns out logistic regression has nothing to do with these notions of logistics or regression.

Image Courtesy: mssqltips.com
url = https://www.mssqltips.com/tipimages2/3471_Example.JPG

In the world of statistics, the word regression is used to mean prediction when applied to practical problems. If you have many input variables, and they are combined with a mathematical equation to calculate the value of an output variable, you do regression. One you have an equation or a formula, you can just plug in any values of the input variables, and you derive the value of the calculated variable. That means you can predict. Thus, regression = prediction.

If the prediction is to decide whether a given entity belongs to one of two classes, then the method is binary classification. Typically, you would have an entity and have to decide whether it belongs to a "type" or its mutally exclusive "non-type".

In linear regression, the output is numeric. But in binary classification, you can't use that mathematical formula directly because the output values are categories (text) like "spam" / "not-spam" or "sick" / "not-sick". So, what do we do? We take the help of probability theory -- we calculate the odds.

Odds is the ratio where the numerator is the probability of an event of interest and the denominator is 1 - probability of the event. Statistics is not a happy place with just ratios; we need logarithms to simplify things. Enter logit, which is the natural logarithm of the odds. This can be any decimal number. Once more, not happy and we simplify again. We transform it into the range 0 - 1.

To do the transformation, the numbers are passed through a function called inverse logit function. The inverse logit function takes a real number and transforms it to a value in the range 0 - 1. So you input the log of odds ratio to the logit function and you get a probability in the range 0 - 1.

Our classification problem is binary. So you decide a threshold say, 0.5. If the output probability is 0.5 or above, then the entity belongs to the "type". Otherwise, we classify that it belongs to "not-type".

So there we are. The process is logarithmic ratio transformation, and applied to log-odds, but the word used is logistic. In fact we should be calling "logistic regression" as "binary category log-odds classification". To my mind, this is more intuitive.

Comments

Bilal Khatri11219 December 2020 at 14:42
I’m going to read this. I’ll be sure to come back. thanks for sharing. and also This article gives the light in which we can observe the reality. this is very nice one and gives indepth information. thanks for this nice article... Europa-Road túlméretes szállítás
ReplyDelete
Replies
Zipaworld - the logistics mall of india19 March 2021 at 14:41
Try Zipaworld – the digital freight forwarder for your logistics needs. You just need 4 clicks on your phone or laptop for booking cargo. Cargo booking and freight forwarding were never so easy.
Visit Us
ReplyDelete
Replies
Jeavons Eurotir22 April 2022 at 16:23
Jeavons Eurotir Ltd. one of the leading shipping companies in the UK, provide excellent shipping services to US-based clients. We plan, procure, and distribute your goods to other countries well. Contact us today at https://www.jeavonseurotir.co.uk
ReplyDelete
Replies

Add comment

Search This Blog

Mahboob's Journal

About Logistic Regression

Comments

Post a Comment

Popular posts from this blog

Mentoring Trainees In Java

27th

Fetching Blogger Posts With Python