Presentation in Toronto on Friday, June 7, 2013: Discriminant Analysis – A Machine Learning Technique for Classification in JMP and SAS

Update: My presentation has been moved from 9:30 am to 10:50 am.  I have switched time slots with Justin Jia.  I will present from 10:50 – 11:20 am.

I will deliver a presentation entitled “Discriminant Analysis – A Machine Learning Technique for Classification in JMP and SAS” at the Toronto Area SAS Society (TASS) on Friday, June 7, 2013.  Discriminant analysis is a powerful technique for predicting categorical target variables, and it can be easily implemented in JMP and SAS.  I will give a gentle, intuitive, but not overly mathematical introduction to this technique that will be accessible to a wide audience of statisticians and analytics professionals from diverse backgrounds.

Eric Cai - Official Head Shot

Come to my next presentation at the Toronto Area SAS Society on Friday, June 7, 2013!

I have previously written about the educational and networking benefits of attending SAS user group events, which are completely free to attend.  Besides TASS, I have also attended the Toronto Data Mining Forum and the SAS Health User Group meetings.  I encourage you to even consider presenting at these meetings; check out my previous presentation on partial least squares regression.

You can find more information about the next meeting in this agenda, which also contains links to the registration web sites.  Note that there are 2 events – one in the morning, and one in the afternoon – so be sure to register for both if you wish to attend the entire day’s events.

Toronto Area SAS Society Meeting

Classic TASS: 9:00 am – 12:00 pm

Interfaces TASS: 1:30 pm – 3:45 pm

Friday, June 7th, 2013

SAS Institute (Canada) Inc.

280 King St. E5th Floor

Toronto, Ontario

A free breakfast is served in the morning, usually starting at 8:30 am.


Webinar – Advanced Predictive Modelling for Manufacturing

The company that I work for, Predictum, is about to begin a free webinar series on statistics and analytics, and I will present the first one on Tuesday, May 14, at 2 pm EDT.  This first webinar will focus on how partial least squares regression can be used as a predictive modelling technique; the data sets are written in the context of manufacturing, but it is definitely to all industries that need techniques beyond basic statistical tools like linear regression for predictive modelling.  JMP, a software that Predictum uses extensively, will be used to illustrate how partial least squares regression can be implemented.  This presentation will not be heavy in mathematical detail, so it will be accessible to a wide audience, including statisticians, analysts, managers, and executives. 

Eric Cai - Official Head Shot

Attend my company’s free webinar to listen to me talking about advanced predictive modelling and partial least squares regression!

To register for this free webinar, visit the webinar’s registration page on Webex.

Presentation Slides: Machine Learning, Predictive Modelling, and Pattern Recognition in Business Analytics

I recently delivered a presentation entitled “Using Advanced Predictive Modelling and Pattern Recognition in Business Analytics” at the Statistical Society of Canada’s (SSC’s) Southern Ontario Regional Association (SORA) Business Analytics Seminar Series.  In this presentation, I

– discussed how traditional statistical techniques often fail in analyzing large data sets

– defined and described machine learning, supervised learning, unsupervised learning, and the many classes of techniques within these fields, as well as common examples in business analytics to illustrate these concepts

– introduced partial least squares regression and bootstrap forest (or random forest) as two examples of supervised learning (0r predictive modelling) techniques that can effectively overcome the common failures of traditional statistical techniques and can be easily implemented in JMP

– illustrated how partial least squares regression and bootstrap forest were successfully used to solve some major problems for 2 different clients at Predictum, where I currently work as a statistician

Read more of this post

Presentation Slides – Overcoming Multicollinearity and Overfitting with Partial Least Squares Regression in JMP and SAS

My slides on partial least squares regression at the Toronto Area SAS Society (TASS) meeting on September 14, 2012, can be found here.

My Presentation on Partial Least Squares Regression

My first presentation to Toronto Area SAS Society (TASS) was delivered on September 14, 2012.  I introduced a supervised learning/predictive modelling technique called partial least squares (PLS) regression; I showed how normal linear least squares regression is often problematic when used with big data because of multicollinearity and overfitting, explained how partial least squares regression overcomes these limitations, and illustrated how to implement it in SAS and JMP.  I also highlighted the variable importance for projection (VIP) score that can be used to conduct variable selection with PLS regression; in particular, I documented its effectiveness as a technique for variable selection by comparing some key journal articles on this issue in academic literature.


The green line is an overfitted classifier.  Not only does it model the underlying trend, but it also models the noise (the random variation) at the boundary.  It separates the blue and the red dots perfectly for this data set, but it will classify very poorly on a new data set from the same population.

Source: Chabacano via Wikimedia
Read more of this post