Analyst Finder – A Free Job-Matching Service for Statisticians, Data Scientists, Database Managers and Data Analysts

If you are a statistician, data scientist, database manager, or data analyst, then consider using Analyst Finder for your next job search.  It is a web site that connects job seekers in data analytics with employers.  The service is free for job seekers, and it earns money by charging companies and recruiters a small fee to find qualified candidates through its job-matching service.

To register for this service as a job seeker, you simply need to complete a check list of skills and preferences.  It’s quick and easy to do, and you can change this list whenever your wish to update your qualifications.



Employers and recruiters can register for this service to search for qualified candidates, and the fees are displayed on the web site.

This company was founded by Art Tabachneck, a former president of the Toronto Area SAS Society and a veteran analytics professional.  In case you’re interested in learning more about him, SAS has a profile about him in recognition of his expertise in SAS programming and his contributions to online support communities and user groups.

How to Find a Job in Statistics – Advice for Students and Recent Graduates


A graduate student in statistics recently asked me for advice on how to find a job in our industry.  I’m happy to share my advice about this, and I hope that my advice can help you to find a satisfying job and develop an enjoyable career.  My perspectives would be most useful to students and recent graduates because of my similar but unique background; I graduated only 1.5 years ago from my Master’s degree in statistics at the University of Toronto, and I volunteered as a career advisor at Simon Fraser University during my Bachelor’s degree.  My advice will reflect my experience in finding a job in Toronto, but you can probably find parallels in your own city.

Most of this post focuses on soft skills that are needed to find any job; I dive specifically into advice for statisticians in the last section.  Although the soft skills are general and not specific to statisticians, many employers, veteran statisticians, and professors have told me that students and recent graduates would benefit from the focus on soft skills.  Thus, I discuss them first and leave the statistics-specific advice till the end.

Read more of this post

Vancouver SAS Users Group Meeting – Wednesday, November 27th @ BC Cancer Agency

I am very excited to attend my first VanSUG meeting next week!  I was very active in the SAS community in Toronto for 2 years before my recent move to Vancouver, and it was a great opportunity to learn and network with other statisticians and analytics professionals.  For the Toronto Area SAS Society, I presented on partial least squares regression, K-means clustering, and discriminant analysis, and I also helped with its statistics break-out group to answer questions and offer advice on statistics questions.  I look forward to meeting and learning from new colleagues as I join Vancouver’s own network of SAS users!

If you will come to this next VanSUG meeting, please come up and say “Hello”!  Here are the details.  This web page also has archives to past presentations and newsletters.

Wednesday November 27th, 2013, 9:00a.m. – 3:00 p.m.
Gordon & Leslie Diamond Family Theatre
BC Cancer Agency Research Centre
675 West 10th Ave.
Vancouver, BC

Presentation in Toronto on Friday, June 7, 2013: Discriminant Analysis – A Machine Learning Technique for Classification in JMP and SAS

Update: My presentation has been moved from 9:30 am to 10:50 am.  I have switched time slots with Justin Jia.  I will present from 10:50 – 11:20 am.

I will deliver a presentation entitled “Discriminant Analysis – A Machine Learning Technique for Classification in JMP and SAS” at the Toronto Area SAS Society (TASS) on Friday, June 7, 2013.  Discriminant analysis is a powerful technique for predicting categorical target variables, and it can be easily implemented in JMP and SAS.  I will give a gentle, intuitive, but not overly mathematical introduction to this technique that will be accessible to a wide audience of statisticians and analytics professionals from diverse backgrounds.

Eric Cai - Official Head Shot

Come to my next presentation at the Toronto Area SAS Society on Friday, June 7, 2013!

I have previously written about the educational and networking benefits of attending SAS user group events, which are completely free to attend.  Besides TASS, I have also attended the Toronto Data Mining Forum and the SAS Health User Group meetings.  I encourage you to even consider presenting at these meetings; check out my previous presentation on partial least squares regression.

You can find more information about the next meeting in this agenda, which also contains links to the registration web sites.  Note that there are 2 events – one in the morning, and one in the afternoon – so be sure to register for both if you wish to attend the entire day’s events.

Toronto Area SAS Society Meeting

Classic TASS: 9:00 am – 12:00 pm

Interfaces TASS: 1:30 pm – 3:45 pm

Friday, June 7th, 2013

SAS Institute (Canada) Inc.

280 King St. E5th Floor

Toronto, Ontario

A free breakfast is served in the morning, usually starting at 8:30 am.

Presentation Slides – Overcoming Multicollinearity and Overfitting with Partial Least Squares Regression in JMP and SAS

My slides on partial least squares regression at the Toronto Area SAS Society (TASS) meeting on September 14, 2012, can be found here.

My Presentation on Partial Least Squares Regression

My first presentation to Toronto Area SAS Society (TASS) was delivered on September 14, 2012.  I introduced a supervised learning/predictive modelling technique called partial least squares (PLS) regression; I showed how normal linear least squares regression is often problematic when used with big data because of multicollinearity and overfitting, explained how partial least squares regression overcomes these limitations, and illustrated how to implement it in SAS and JMP.  I also highlighted the variable importance for projection (VIP) score that can be used to conduct variable selection with PLS regression; in particular, I documented its effectiveness as a technique for variable selection by comparing some key journal articles on this issue in academic literature.


The green line is an overfitted classifier.  Not only does it model the underlying trend, but it also models the noise (the random variation) at the boundary.  It separates the blue and the red dots perfectly for this data set, but it will classify very poorly on a new data set from the same population.

Source: Chabacano via Wikimedia
Read more of this post

Presentation Slides – Finding Patterns in Data with K-Means Clustering in JMP and SAS

My slides on K-means clustering at the Toronto Area SAS Society (TASS) meeting on December 14, 2012, can be found here.

Screen Shot 2014-01-04 at 8.15.18 PM

This image is slightly enhanced from an image created by Weston.pace from Wikimedia Commons.

My Presentation on K-Means Clustering

I was very pleasured to be invited for the second time by the Toronto Area SAS Society (TASS) to deliver a presentation on machine learning.  (I previously presented on partial least squares regression.)  At its recent meeting on December 14, 2012, I introduced an unsupervised learning technique called K-means clustering.

I first defined clustering as a set of techniques for identifying groups of objects by maximizing a similarity criterion or, equivalently, minimizing a dissimilarity criterion.  I then defined K-means clustering specifically as a clustering technique that uses Euclidean proximity to a group mean as its similarity criterion.  I illustrated how this technique works with a simple 2-dimensional example; you can follow along this example in the slides by watching the sequence of images of the clusters toward convergence.  As with many other machine learning techniques, some arbitrary decisions need to be made to initiate the algorithm for K-means clustering:

  1. How many clusters should there be?
  2. What is the mean of each cluster?

I provided some guidelines on how to make these decisions in these slides.

Read more of this post