Analytical Chemistry Lesson of the Day – Accuracy in Method Validation and Quality Assurance

In pharmaceutical chemistry, one of the requirements for method validation is accuracy, the ability of an analytical method to obtain a value of a measurement that is close to the true value. There are several ways of assessing an analytical method for accuracy.

  1. Compare the value from your analytical method with an established or reference method.
  2. Use your analytical method to obtain a measurement from a sample with a known quantity (i.e. a reference material), and compare the measured value with the true value.
  3. If you don’t have a reference material for the second way, you can make your own by spiking a blank matrix with a measured quantity of the analyte.
  4. If your matrix may interfere with the analytical signal, then you cannot spike a blank matrix as described in the third way.  Instead, spike your sample with an known quantity of the standard.  I elaborate on this in a separate tutorial on standard addition, a common technique in analytical chemistry for determining the quantity of a substance when matrix interference exists.  Standard addition is an example of the second way of assessing accuracy as I mentioned above.  You can view the original post of this tutorial on the official JMP blog.

Potato Chips and ANOVA, Part 2: Using Analysis of Variance to Improve Sample Preparation in Analytical Chemistry

In this second article of a 2-part series on the official JMP blog, I use analysis of variance (ANOVA) to assess a sample-preparation scheme for quantifying sodium in potato chips.  I illustrate the use of the “Fit Y by X” platform in JMP to implement ANOVA, and I propose an alternative sample-preparation scheme to obtain a sample with a smaller variance.  This article is entitled “Potato Chips and ANOVA, Part 2: Using Analysis of Variance to Improve Sample Preparation in Analytical Chemistry“.

If you haven’t read my first blog post in this series on preparing the data in JMP and using the “Stack Columns” function to transpose data from wide format to long format, check it out!  I presented this topic at the last Vancouver SAS User Group (VanSUG) meeting on Wednesday, November 4, 2015.

My thanks to Arati Mejdal, Louis Valente, and Mark Bailey at JMP for their guidance in writing this 2-part series!  It is a pleasure to be a guest blogger for JMP!



Potato Chips and ANOVA in Analytical Chemistry – Part 1: Formatting Data in JMP

I am very excited to write again for the official JMP blog as a guest blogger!  Today, the first article of a 2-part series has been published, and it is called “Potato Chips and ANOVA in Analytical Chemistry – Part 1: Formatting Data in JMP“.  This series of blog posts will talk about analysis of variance (ANOVA), sampling, and analytical chemistry, and it uses the quantification of sodium in potato chips as an example to illustrate these concepts.

The first part of this series discusses how to import the data into the JMP and prepare them for ANOVA.  Specifically, it illustrates how the “Stack Columns” function is used to transpose the data from wide format to long format.

I will present this at the Vancouver SAS User Group (VanSUG) meeting later today.

Stay tuned for “Part 2: Using Analysis of Variance to Improve Sample Preparation in Analytical Chemistry“!



Vancouver SAS User Group Meeting – Wednesday, November 4, 2015

I am excited to present at the next Vancouver SAS User Group (VanSUG) meeting on Wednesday, November 4, 2015.  I will illustrate data transposition and ANOVA in SAS and JMP using potato chips and analytical chemistry.  Come and check it out!  The following agenda contains all of the presentations, and you can register for this meeting on the SAS Canada web site.  This meeting is free, and a free breakfast will be served in the morning.


Update: My slides from this presentation have been posted on the VanSUG web site.


Date: Wednesday, November 4, 2015


Ballroom West and Centre

Holiday Inn – Vancouver Centre

711 West Broadway, Vancouver, BC

V5Z 3Y2

(604) 879-0511


8:30am – 9:00am: Registration

9:00am – 9:20am: Introductions and SAS Update – Matt Malczewski, SAS Canada

9:20am – 9:40am: Lessons On Transposing Data, Sampling & ANOVA in SAS & JMP – Eric Cai, Cancer Surveillance & Outcomes, BC Cancer Agency

9.40am – 10.20am: Make SAS Enterprise Guide Your Own – John Ladds, Statistics Canada

10:20am – 10:30am: A Beginner’s Experience Using SAS – Kim Burrus, Cancer Surveillance & Outcomes, BC Cancer Agency

10:30am – 11:00am: Networking Break

11:00am – 11.20am: Using SAS for Simple Calculations – Jay Shurgold, Rick Hansen Institute

11:20am – 11:50am: Yes, We Can… Save SAS Formats – John Ladds, Statistics Canada

11:50am – 12:20pm: Reducing Customer Attrition with Predictive Analytics – Nate Derby, Stakana Analytics

12:20pm – 12:30pm: Evaluations, Prize Draw & Closing Remarks

If you would like to be notified of upcoming SAS User Group Meetings in Vancouver, please subscribe to the Vancouver SAS User Group Distribution List.

Eric’s Enlightenment for Friday, June 5, 2015

  1. Christian Robert provides a gentle introduction to the Metropolis-Hastings algorithm with accompanying R codes.  (Hat Tip: David Campbell)
  2. John Sall demonstrates how to perform discriminant analysis in JMP, especially for data sets with many variables.
  3. Using machine learning instead of human judgment may improve the selection of job candidates.  This article also includes an excerpt from a New York Times article about how the Milwaukee Bucks used facial recognition as one justification to choose Jabari Parker over Dante Exum.  (Hat Tip: Tyler Cowen)
  4. “A hospital at the University of California San Francisco Medical Center has a robot filling prescriptions.”

Vancouver SAS User Group Meeting – Wednesday, November 26, 2014, at Holiday Inn Vancouver-Centre (West Broadway)

I am pleased to have recently joined the executive organizing team of the Vancouver SAS User Group.  We hold meetings twice per year to allow Metro Vancouver users of all kinds of SAS products to share their knowledge, tips and advice with others.  These events are free to attend, but registration is required.

SAS Logo - The Power to Know

Our next meeting will be held on Wednesday, November 26, 2014.  Starting from 8:30 am, a free breakfast will be served while registration takes place.  The session will begin at 9:00 am and end at 12:30 pm with a prize draw.

Please note that there is a new location for this meeting: the East and Centre Ballrooms at Holiday Inn Vancouver-Centre at 711 West Broadway in Vancouver.  We will also experiment with holding a half-day session by ending at 12:30 pm at this meeting.  Visit our web site for more information and to register for this free event!

If you will attend this event, please feel free to come and say “Hello”!

Read the rest of this post for the full agenda!

Read more of this post

Determining chemical concentration with standard addition: An application of linear regression in JMP – A Guest Blog Post for the JMP Blog

I am very excited to announce that I have been invited by JMP to be a guest blogger for its official blog!  My thanks to Arati Mejdal, Global Social Media Manager for the JMP Division of SAS, for welcoming me into the JMP blogging community with so much support and encouragement, and I am pleased to publish my first post on the JMP Blog!  Mark Bailey and Byron Wingerd from JMP provided some valuable feedback to this blog post, and I am fortunate to get the chance to work with and learn from them!

Following the tradition of The Chemical Statistician, this post combines my passions for statistics and chemistry by illustrating how simple linear regression can be used for the method of standard addition in analytical chemistry.  In particular, I highlight the useful capability of the “Inverse Prediction” function under “Fit Model” platform in JMP to estimate the predictor given an observed response value (i.e. estimate the value of x_i given y_i).  Check it out!

JMP blog post - standard addition

Presentation in Toronto on Friday, June 7, 2013: Discriminant Analysis – A Machine Learning Technique for Classification in JMP and SAS

Update: My presentation has been moved from 9:30 am to 10:50 am.  I have switched time slots with Justin Jia.  I will present from 10:50 – 11:20 am.

I will deliver a presentation entitled “Discriminant Analysis – A Machine Learning Technique for Classification in JMP and SAS” at the Toronto Area SAS Society (TASS) on Friday, June 7, 2013.  Discriminant analysis is a powerful technique for predicting categorical target variables, and it can be easily implemented in JMP and SAS.  I will give a gentle, intuitive, but not overly mathematical introduction to this technique that will be accessible to a wide audience of statisticians and analytics professionals from diverse backgrounds.

Eric Cai - Official Head Shot

Come to my next presentation at the Toronto Area SAS Society on Friday, June 7, 2013!

I have previously written about the educational and networking benefits of attending SAS user group events, which are completely free to attend.  Besides TASS, I have also attended the Toronto Data Mining Forum and the SAS Health User Group meetings.  I encourage you to even consider presenting at these meetings; check out my previous presentation on partial least squares regression.

You can find more information about the next meeting in this agenda, which also contains links to the registration web sites.  Note that there are 2 events – one in the morning, and one in the afternoon – so be sure to register for both if you wish to attend the entire day’s events.

Toronto Area SAS Society Meeting

Classic TASS: 9:00 am – 12:00 pm

Interfaces TASS: 1:30 pm – 3:45 pm

Friday, June 7th, 2013

SAS Institute (Canada) Inc.

280 King St. E5th Floor

Toronto, Ontario

A free breakfast is served in the morning, usually starting at 8:30 am.

Presentation Slides: Machine Learning, Predictive Modelling, and Pattern Recognition in Business Analytics

I recently delivered a presentation entitled “Using Advanced Predictive Modelling and Pattern Recognition in Business Analytics” at the Statistical Society of Canada’s (SSC’s) Southern Ontario Regional Association (SORA) Business Analytics Seminar Series.  In this presentation, I

– discussed how traditional statistical techniques often fail in analyzing large data sets

– defined and described machine learning, supervised learning, unsupervised learning, and the many classes of techniques within these fields, as well as common examples in business analytics to illustrate these concepts

– introduced partial least squares regression and bootstrap forest (or random forest) as two examples of supervised learning (0r predictive modelling) techniques that can effectively overcome the common failures of traditional statistical techniques and can be easily implemented in JMP

– illustrated how partial least squares regression and bootstrap forest were successfully used to solve some major problems for 2 different clients at Predictum, where I currently work as a statistician

Read more of this post

Presentation Slides – Overcoming Multicollinearity and Overfitting with Partial Least Squares Regression in JMP and SAS

My slides on partial least squares regression at the Toronto Area SAS Society (TASS) meeting on September 14, 2012, can be found here.

My Presentation on Partial Least Squares Regression

My first presentation to Toronto Area SAS Society (TASS) was delivered on September 14, 2012.  I introduced a supervised learning/predictive modelling technique called partial least squares (PLS) regression; I showed how normal linear least squares regression is often problematic when used with big data because of multicollinearity and overfitting, explained how partial least squares regression overcomes these limitations, and illustrated how to implement it in SAS and JMP.  I also highlighted the variable importance for projection (VIP) score that can be used to conduct variable selection with PLS regression; in particular, I documented its effectiveness as a technique for variable selection by comparing some key journal articles on this issue in academic literature.


The green line is an overfitted classifier.  Not only does it model the underlying trend, but it also models the noise (the random variation) at the boundary.  It separates the blue and the red dots perfectly for this data set, but it will classify very poorly on a new data set from the same population.

Source: Chabacano via Wikimedia
Read more of this post

Presentation Slides – Finding Patterns in Data with K-Means Clustering in JMP and SAS

My slides on K-means clustering at the Toronto Area SAS Society (TASS) meeting on December 14, 2012, can be found here.

Screen Shot 2014-01-04 at 8.15.18 PM

This image is slightly enhanced from an image created by Weston.pace from Wikimedia Commons.

My Presentation on K-Means Clustering

I was very pleasured to be invited for the second time by the Toronto Area SAS Society (TASS) to deliver a presentation on machine learning.  (I previously presented on partial least squares regression.)  At its recent meeting on December 14, 2012, I introduced an unsupervised learning technique called K-means clustering.

I first defined clustering as a set of techniques for identifying groups of objects by maximizing a similarity criterion or, equivalently, minimizing a dissimilarity criterion.  I then defined K-means clustering specifically as a clustering technique that uses Euclidean proximity to a group mean as its similarity criterion.  I illustrated how this technique works with a simple 2-dimensional example; you can follow along this example in the slides by watching the sequence of images of the clusters toward convergence.  As with many other machine learning techniques, some arbitrary decisions need to be made to initiate the algorithm for K-means clustering:

  1. How many clusters should there be?
  2. What is the mean of each cluster?

I provided some guidelines on how to make these decisions in these slides.

Read more of this post