## Machine Learning Lesson of the Day – Supervised Learning: Classification and Regression

Supervised learning has 2 categories:

• In classification, the target variable is categorical.
• In regression, the target variable is continuous.

Thus, regression in statistics is different from regression in supervised learning.

In statistics,

• regression is used to model relationships between predictors and targets, and the targets could be continuous or categorical.
• a regression model usually includes 2 components to describe such relationships:
• a systematic component
• a random component.  The random component of this relationship is mathematically described by some probability distribution.
• most regression models in statistics also have assumptions about the between the predictors and/or between the observations.
• many statistical models also aim to provide interpretable relationships between the predictors and targets.
• For example, in simple linear regression, the slope parameter, $\beta_1$, predicts the change in the target, $Y$, for every unit increase in the predictor, $X$.

In supervised learning,

• target variables in regression must be continuous
• categorical target variables are modelled in classification
• regression has less or even no emphasis on using probability to describe the random variation between the predictor and the target
• Random forests are powerful tools for both classification and regression, but they do not use probability to describe the relationship between the predictors and the target.
• regression has less or even no emphasis on providing interpretable relationships between the predictors and targets.
• Neural networks are powerful tools for both classification and regression, but they do not provide interpretable relationships between the predictors and the target.

***The last 2 points are applicable to classification, too.

In general, supervised learning puts much more emphasis on accurate prediction than statistics.

Since regression in supervised learning includes only continuous targets, this results in some confusing terminology between the 2 fields.  For example, logistic regression is a commonly used technique in both statistics and supervised learning.  However, despite its name, it is a classification technique in supervised learning, because the response variable in logistic regression is categorical.

## Presentation Slides: Machine Learning, Predictive Modelling, and Pattern Recognition in Business Analytics

I recently delivered a presentation entitled “Using Advanced Predictive Modelling and Pattern Recognition in Business Analytics” at the Statistical Society of Canada’s (SSC’s) Southern Ontario Regional Association (SORA) Business Analytics Seminar Series.  In this presentation, I

– discussed how traditional statistical techniques often fail in analyzing large data sets

– defined and described machine learning, supervised learning, unsupervised learning, and the many classes of techniques within these fields, as well as common examples in business analytics to illustrate these concepts

– introduced partial least squares regression and bootstrap forest (or random forest) as two examples of supervised learning (0r predictive modelling) techniques that can effectively overcome the common failures of traditional statistical techniques and can be easily implemented in JMP

– illustrated how partial least squares regression and bootstrap forest were successfully used to solve some major problems for 2 different clients at Predictum, where I currently work as a statistician