Vancouver SAS Users Group Meeting – Wednesday, November 27th @ BC Cancer Agency

I am very excited to attend my first VanSUG meeting next week!  I was very active in the SAS community in Toronto for 2 years before my recent move to Vancouver, and it was a great opportunity to learn and network with other statisticians and analytics professionals.  For the Toronto Area SAS Society, I presented on partial least squares regression, K-means clustering, and discriminant analysis, and I also helped with its statistics break-out group to answer questions and offer advice on statistics questions.  I look forward to meeting and learning from new colleagues as I join Vancouver’s own network of SAS users!

If you will come to this next VanSUG meeting, please come up and say “Hello”!  Here are the details.  This web page also has archives to past presentations and newsletters.

Wednesday November 27th, 2013, 9:00a.m. – 3:00 p.m.
Gordon & Leslie Diamond Family Theatre
BC Cancer Agency Research Centre
675 West 10th Ave.
Vancouver, BC

Advertisements

Presentation Slides – Overcoming Multicollinearity and Overfitting with Partial Least Squares Regression in JMP and SAS

My slides on partial least squares regression at the Toronto Area SAS Society (TASS) meeting on September 14, 2012, can be found here.

My Presentation on Partial Least Squares Regression

My first presentation to Toronto Area SAS Society (TASS) was delivered on September 14, 2012.  I introduced a supervised learning/predictive modelling technique called partial least squares (PLS) regression; I showed how normal linear least squares regression is often problematic when used with big data because of multicollinearity and overfitting, explained how partial least squares regression overcomes these limitations, and illustrated how to implement it in SAS and JMP.  I also highlighted the variable importance for projection (VIP) score that can be used to conduct variable selection with PLS regression; in particular, I documented its effectiveness as a technique for variable selection by comparing some key journal articles on this issue in academic literature.

overfitting

The green line is an overfitted classifier.  Not only does it model the underlying trend, but it also models the noise (the random variation) at the boundary.  It separates the blue and the red dots perfectly for this data set, but it will classify very poorly on a new data set from the same population.

Source: Chabacano via Wikimedia
Read more of this post