Machine Learning Lesson of the Day – Supervised Learning: Classification and Regression

Supervised learning has 2 categories:

  • In classification, the target variable is categorical.
  • In regression, the target variable is continuous.

Thus, regression in statistics is different from regression in supervised learning.

In statistics,

  • regression is used to model relationships between predictors and targets, and the targets could be continuous or categorical.  
  • a regression model usually includes 2 components to describe such relationships:
    • a systematic component
    • a random component.  The random component of this relationship is mathematically described by some probability distribution.  
  • most regression models in statistics also have assumptions about the statistical independence or dependence between the predictors and/or between the observations.  
  • many statistical models also aim to provide interpretable relationships between the predictors and targets.  
    • For example, in simple linear regression, the slope parameter, \beta_1, predicts the change in the target, Y, for every unit increase in the predictor, X.

In supervised learning,

  • target variables in regression must be continuous
    • categorical target variables are modelled in classification
  • regression has less or even no emphasis on using probability to describe the random variation between the predictor and the target
    • Random forests are powerful tools for both classification and regression, but they do not use probability to describe the relationship between the predictors and the target.
  • regression has less or even no emphasis on providing interpretable relationships between the predictors and targets.  
    • Neural networks are powerful tools for both classification and regression, but they do not provide interpretable relationships between the predictors and the target.

***The last 2 points are applicable to classification, too.

In general, supervised learning puts much more emphasis on accurate prediction than statistics.

Since regression in supervised learning includes only continuous targets, this results in some confusing terminology between the 2 fields.  For example, logistic regression is a commonly used technique in both statistics and supervised learning.  However, despite its name, it is a classification technique in supervised learning, because the response variable in logistic regression is categorical.

Your thoughtful comments are much appreciated!

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: