Career-advice seminar at the University of Toronto – Wednesday, February 7, 2018

I am excited to visit the University of Toronto on Wednesday, February 7, to share my career advice in a seminar and in a question-and-answer session.  Both events will be in the Debates Room at Hart House.  Hart House is located at 7 Hart House Circle in Toronto, Ontario.  The Debates Room is on the second floor of Hart House.

Eric Cai - Official Head Shot

  • My presentation will occur from 11 am to 12 pm.
  • I will answer questions in an open forum from 1 pm to 2 pm.

I will talk about

  • my diverse jobs in industrial statistics, medicine, banking, and marketing analytics since earning my Master’s degree in statistics
  • the skills that my jobs demand but I did not learn in my formal education in statistics
  • effective strategies for finding a job in statistics
  • building an online brand as a statistician
  • developing a meaningful career in a systematic way
  • important steps that students can take during their studies to prepare for a career outside of academia

I strongly encourage all attendees to read my career-advice columns in advance, especially “How to Find a Job in Statistics – Advice for Students and Recent Graduates“.

If you will attend this event, then please feel free to come and say “Hello”!

I thank Jeffrey Negrea and Dr. Radu Craiu from the University of Toronto for their help in coordinating this event.  Jeffrey is the president of the Statistics Graduate Student Union, and Dr. Craiu is the Associate Chair for Graduate Affairs in the Department of Statistical Sciences.

 

Advertisements

SORA Business Analytics Seminar – Tuesday, January 16, 2018 – RBC WaterPark Place

I will attend the SORA Business Analytics Seminar at RBC WaterPark Place on Tuesday, January 16, 2018.  The event will occur from 4:00 pm to 5:30 pm.  The event is called “The Future of Data Science”, and it will be a panel discussion on the landscape of data science with a variety of data science practitioners to add their experience and perspective.

SORA

If you will attend this event, then please feel free to come and say “Hello”!

RBC WaterPark Place is located at 88 Queens Quay West in Toronto, Ontario.

SORA is the Southern Ontario Regional Association of the Statistical Society of Canada (SSC).

Arnab Chakraborty on Bayes’ Theorem – The Central Equilibrium – Episode 3

Arnab Chakraborty kindly came to my new talk show, “The Central Equilibrium”, to talk about Bayes’ theorem.  He introduced the concept of conditional probability, stated Bayes’ theorem in its simple and general forms, and showed an example of how to use it in a calculation.

Check it out!

Christopher Salahub on Markov Chains – The Central Equilibrium – Episode 2

It was a great pleasure to talk to Christopher Salahub about Markov chains in the second episode of my new talk show, The Central Equilibrium!  Chris graduated from the University of Waterloo with a Bachelor of Mathematics degree in statistics.  He just finished an internship in data development at Environics Analytics, and he is starting a Master’s program in statistics at ETH Zurich in Switzerland.

Chris recommends “Introduction to Probability Models” by Sheldon Ross to learn more about probability theory and Markov chains.

The Central Equilibrium is my new talk show about math, science, and economics. It focuses on technical topics that involve explanations with formulas, equations, graphs, and diagrams.  Stay tuned for more episodes in the coming weeks!

You can watch all of my videos on my YouTube channel!

Please watch the video on this blog.  You can also watch it directly on YouTube.

Store multiple strings of text as a macro variable in SAS with PROC SQL and the INTO statement

I often need to work with many variables at a time in SAS, but I don’t like to type all of their names manually – not only is it messy to read, it also induces errors in transcription, even when copying and pasting.  I recently learned of an elegant and efficient way to store multiple variable names into a macro variable that overcomes those problems.  This technique uses the INTO statement in PROC SQL.

To illustrate how this storage method can be applied in a practical context, suppose that we want to determine the factors that contribute to a baseball player’s salary in the built-in SASHELP.BASEBALL data setI will consider all continuous variables other than “Salary” and “logSalary”, but I don’t want to write them explicitly in any programming statements.  To do this, I first obtain the variable names and types of a data set using PROC CONTENTS.

* create a data set of the variable names;
proc contents
     data = sashelp.baseball
          noprint
     out = bvars (keep = name type);
run;

Read more of this post

Use the LENGTH statement to pre-set the lengths of character variables in SAS – with a comparison to R

I often create character variables (i.e. variables with strings of text as their values) in SAS, and they sometimes don’t render as expected.  Here is an example involving the built-in data set SASHELP.CLASS.

Here is the code:

data c1;
     set sashelp.class;
 
     * define a new character variable to classify someone as tall or short;
     if height > 60
     then height_class = 'Tall';
          else height_class = 'Short';
run;


* print the results for the first 5 rows;
proc print
     data = c1 (obs = 5);
run;

Here is the result:

Obs Name Sex Age Height Weight height_class
1 Alfred M 14 69.0 112.5 Tall
2 Alice F 13 56.5 84.0 Shor
3 Barbara F 13 65.3 98.0 Tall
4 Carol F 14 62.8 102.5 Tall
5 Henry M 14 63.5 102.5 Tall

What happened?  Why does the word “Short” render as “Shor”?

Read more of this post

Neil Seoni on the Fourier Transform and the Sampling Theorem – The Central Equilibrium – Episode 1

I am very excited to publish the very first episode of my new talk show, The Central Equilibrium!  My guest is Neil Seoni, an undergraduate student in electrical and computer engineering at Rice University in Houston, Texas. He has studied data science in his spare time, most notably taking a course on machine learning by Andrew Ng on Coursera. He is finishing his summer job as a Data Science Intern at Environics Analytics in Toronto, Ontario.

Neil recommends reading Don Johnson’s course notes from Rice University and his free text book to learn more about the topics covered in his episode.

The Central Equilibrium is my new talk show about math, science, and economics. It focuses on technical topics that involve explanations with formulas, equations, graphs, and diagrams.  Stay tuned for more episodes in the coming weeks!

You can watch all of my videos on my YouTube channel!

Please watch the video on this blog.  You can also watch it directly on YouTube.

A Comprehensive Guide for Public Speaking at Scientific Conferences

Introduction

I served as a judge for some of the student presentations at the 2016 Canadian Statistics Student Conference (CSSC).  The conference was both a learning opportunity and a networking opportunity for statistics students in Canada.  The presentations allowed the students to share their research and course projects with their peers, and it was a chance for them to get feedback about their work and learn new ideas from other students.

Unfortunately, I found most of the presentations to be very bad – not necessarily in terms of the content, but because of the delivery.  Although the students showed much earnestness and eagerness in sharing their work with others, most of them demonstrated poor competence in public speaking.

Public speaking is an important skill in knowledge-based industries, so these opportunities are valuable experiences for anybody to strengthen this skill.  You can only learn it by doing it many times, making mistakes, and learning from those mistakes.  Having delivered many presentations, learned from my share of mistakes, and received much praise for my seminars, I hope that the following tips will help anyone who presents at scientific conferences to improve their public-speaking skills.  In fact, most of these tips apply to public speaking in general.

I spoke at the 2016 Canadian Statistics Student Conference on career advice for students and new graduates in statistics.

Image courtesy of Peter Macdonald on Flickr.

Read more of this post

Maximizing Your Learning Potential at Professional Conferences – A Detailed Guide

Introduction

During last summer, I attended the 2016 Annual Meeting of the Statistical Society of Canada (SSC).  I spoke on the career-advice panel at the 2016 Canadian Statistics Student Conference (CSSC), and I met some colleagues and professors to share ideas about our mutual interests in statistics, statistical education, and the use of social media to promote statistics to the general public.

From observing and talking to many students at this conference, I realized that most of them did not use it effectively to maximize their learning potential.  A conference like this is a great opportunity for networking, career development, and – eventually – finding a job, but I suspect that most statistics students do not comprehend the depth of its value, let alone how to extract it.  Thus, I’m writing this advice column to help anyone who attends a professional conference.

Image courtesy of Rufino from Wikimedia Commons.

Objectives

Most statistics students want to succeed academically and find a job after completing their education – that job could be within or outside of academia.  Thus, at any professional conference, they should have the following objectives:

  1. To learn new ideas in your fields of interest
  2. To meet others who share your professional interests
  3. To learn soft skills from veterans in your industry for developing your career
  4. To build valuable relationships in your professional network

Unfortunately, based on my anecdotal observations, many students in statistics, math and science don’t seem to grasp Objectives #3-4.  These students tend to be passive in their attendance and shy in their participation.  When they do try to pursue Objectives #3-4, they are often unprepared and do not take advantage of all of the learning opportunities that are available to them.

The first step in maximizing your learning potential at a professional conference is recognizing that it takes preparation and hard work.  To do it well, you need to take all 4 objectives seriously and practice them frequently.  Attending a professional conference is a skill, and developing this skill requires thought and effort.  It involves much more than just showing up, talking at your turn, and listening at all other times.

Hopefully, the rest of this article will help you to develop this skill in an intelligent way, but you must realize that there is no substitution for hard work.

Read more of this post

Getting the names, types, formats, lengths, and labels of variables in a SAS data set

After reading my blog post on getting the variable names of a SAS data set, a reader named Robin asked how to get the formats as well.  I asked SAS Technical Support for help, and a consultant named Jerry Leonard provided a beautiful solution using PROC SQL.  Besides the names and formats of the variables, it also gives the types, lengths, and labels.  Here is an example of how to do so with the CLASS data set in the built-in SASHELP library.

* add formats and labels to 3 of the variables in the CLASS data set;
data class;                                                      
       set sashelp.class;                                            
       format 
            age 8.  
            weight height 8.2 
            name $15.;          
       label 
            age = 'Age'
            weight = 'Weight'
            height = 'Height';
run;                                                             
                  

* extract the variable information using PROC SQL; 
proc sql 
       noprint;                                                
       create table class_info as 
       select libname as library, 
              memname as data_set, 
              name as variable_name, 
              type, 
              length, 
              format, 
              label       
       from dictionary.columns                                       
       where libname = 'WORK' and memname = 'CLASS';                     
       /* libname and memname values must be upper case  */         
quit;                                                          
                   
 
* print the resulting table;
proc print 
       data = class_info;                                            
run;

Here is the result of that PROC PRINT step in the Results Viewer.  Notice that it also has the type, length, format, and label of each variable.

Obs library data_set variable_name type length format label
1 WORK CLASS Name char 8 $15.
2 WORK CLASS Sex char 1
3 WORK CLASS Age num 8 8. Age
4 WORK CLASS Height num 8 8.2 Height
5 WORK CLASS Weight num 8 8.2 Weight

Thank you, Jerry, for sharing your tip!

Sorting correlation coefficients by their magnitudes in a SAS macro

Theoretical Background

Many statisticians and data scientists use the correlation coefficient to study the relationship between 2 variables.  For 2 random variables, X and Y, the correlation coefficient between them is defined as their covariance scaled by the product of their standard deviations.  Algebraically, this can be expressed as

\rho_{X, Y} = \frac{Cov(X, Y)}{\sigma_X \sigma_Y} = \frac{E[(X - \mu_X)(Y - \mu_Y)]}{\sigma_X \sigma_Y}.

In real life, you can never know what the true correlation coefficient is, but you can estimate it from data.  The most common estimator for \rho is the Pearson correlation coefficient, which is defined as the sample covariance between X and Y divided by the product of their sample standard deviations.  Since there is a common factor of

\frac{1}{n - 1}

in the numerator and the denominator, they cancel out each other, so the formula simplifies to

r_P = \frac{\sum_{i = 1}^{n}(x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\sum_{i = 1}^{n}(x_i - \bar{x})^2 \sum_{i = 1}^{n}(y_i - \bar{y})^2}} .

 

In predictive modelling, you may want to find the covariates that are most correlated with the response variable before building a regression model.  You can do this by

  1. computing the correlation coefficients
  2. obtaining their absolute values
  3. sorting them by their absolute values.

Read more of this post

University of Toronto Statistical Sciences Union Career Panel

I am delighted to be invited to speak at the University of Toronto Statistical Sciences Union’s first ever Career Panel.  If you plan to attend this event, I encourage you to read my advice columns on career development in advance.  In particular, I strongly encourage you to read the blog post “How to Find a Job in Statistics – Advice for Students and Recent Graduates“.  I will not cover all of the topics in these columns, but you are welcomed to ask questions about them during the question-and-answer period.

Here are the event’s details.

Time: 1 pm to 6 pm

  • My session will be held from 5pm to 6 pm.

Date: Saturday, March 25, 2017

Location: Sidney Smith Hall, 100 St. George Street, Toronto, Ontario.

  • Sidney Smith Hall is located on the St. George (Downtown) campus of the University of Toronto.
  • Update: The seminars will be held in Rooms 2117 and 2118.  I will speak in Room 2117 at 5 pm.

 

If you will attend this event, please feel free to come and say “Hello”!

Analytical Chemistry Lesson of the Day – Accuracy in Method Validation and Quality Assurance

In pharmaceutical chemistry, one of the requirements for method validation is accuracy, the ability of an analytical method to obtain a value of a measurement that is close to the true value. There are several ways of assessing an analytical method for accuracy.

  1. Compare the value from your analytical method with an established or reference method.
  2. Use your analytical method to obtain a measurement from a sample with a known quantity (i.e. a reference material), and compare the measured value with the true value.
  3. If you don’t have a reference material for the second way, you can make your own by spiking a blank matrix with a measured quantity of the analyte.
  4. If your matrix may interfere with the analytical signal, then you cannot spike a blank matrix as described in the third way.  Instead, spike your sample with an known quantity of the standard.  I elaborate on this in a separate tutorial on standard addition, a common technique in analytical chemistry for determining the quantity of a substance when matrix interference exists.  Standard addition is an example of the second way of assessing accuracy as I mentioned above.  You can view the original post of this tutorial on the official JMP blog.

New Job as Data Science Consultant at Environics Analytics!

I am very excited to start a new job as a Data Science Consultant at Environics Analytics (EA)!  My new position is a dual role of data scientist and consultant – I will meet clients regularly to advise them on statistical modelling, data analysis, and marketing analytics, and I will conduct statistical research on a variety of problems.  I look forward to learning about and working in EA’s specialty of geodemography, as well as researching new areas of data analytics, such as text mining and sentiment analysis.

I began my new job in mid-August, and I have enjoyed meeting my new co-workers and serving my first client so far.  I have opened a second Twitter feed, @EricCaiEA, to share my work at my new company.  You can continue to find me on my own Twitter feed for The Chemical Statistician, @chemstateric.

eric-cai-in-front-of-ea-logo

Analyst Finder – A Free Job-Matching Service for Statisticians, Data Scientists, Database Managers and Data Analysts

If you are a statistician, data scientist, database manager, or data analyst, then consider using Analyst Finder for your next job search.  It is a web site that connects job seekers in data analytics with employers.  The service is free for job seekers, and it earns money by charging companies and recruiters a small fee to find qualified candidates through its job-matching service.

To register for this service as a job seeker, you simply need to complete a check list of skills and preferences.  It’s quick and easy to do, and you can change this list whenever your wish to update your qualifications.

af

 

Employers and recruiters can register for this service to search for qualified candidates, and the fees are displayed on the web site.

This company was founded by Art Tabachneck, a former president of the Toronto Area SAS Society and a veteran analytics professional.  In case you’re interested in learning more about him, SAS has a profile about him in recognition of his expertise in SAS programming and his contributions to online support communities and user groups.

My Alumni Profile by Simon Fraser University – Where Are They Now?

I am happy and grateful to be featured by my alma mater, Simon Fraser University (SFU), in a recent profile.  I answered questions about how my transition from my academic education to my career in statistics and about how blogging and social media have helped me to advance my career.  Check it out!

During my undergraduate degree at SFU, I volunteered at its Career Services Centre for 5 years as a career advisor in its Peer Education program.  I began writing for its official blog, the Career Services Informer (CSI), during that time.  I have continued to write career advice for the CSI as an alumnus, and it is always a pleasure to give back to this wonderful centre!

You can find all of my advice columns here on my blog.

eric-cai-sfu-where-are-they-now-profile-screen-shot

Career Advice Panel – Statistical Society of Canada’s Annual Student Conference

I am excited to go to Brock University in St. Catharines, Ontario, and speak at the Statistical Society of Canada‘s (SSC’s) Annual Student Conference on Saturday, May 28, 2016!  This one-day conference will be a chance for statistics students from all over Canada to share their research with each other, network with industry professionals, and get career advice from the career advice panel.  I will be one of 3 speakers on this panel, and I look forward to sharing my advice and answering the students’ questions.  Read the Final Program Booklet to get the schedule and learn about the backgrounds of all speakers at this conference.

If you will attend this, conference, please feel free to come and say “Hello”!

ssc-logo

This event will occur before the 2016 Annual Conference of the Statistical Society of Canada.

 

New Job at the Bank of Montreal in Toronto

I have accepted an offer from the Bank of Montreal to become a Manager of Operational Risk Analytics and Modelling at its corporate headquarter office in Toronto.  Thus, I have resigned from my job at the British Columbia Cancer Agency.  I will leave Vancouver at the end of December, 2015, and start my new job at the beginning of January, 2016.

I have learned some valuable skills and met some great people here in Vancouver over the past 2 years.  My R programming skills have improved a lot, especially in text processing.  My SAS programming skills have improved a lot, and I began a new section on my blog to SAS programming as a result of what I learned.  I volunteered and delivered presentations for the Vancouver SAS User Group (VanSUG) – once on statistical genetics, and another on sampling strategies in analytical chemistry, ANOVA, and PROC TRANSPOSE.  I have thoroughly enjoyed meeting some smart and helpful people at the Data Science, Machine Learning, and R Programming Meetups.

I lived in Toronto from 2011 to 2013 while pursuing my Master’s degree in statistics at the  University of Toronto and working as a statistician at Predicum.  I look forward to re-connecting with my colleagues there.

Potato Chips and ANOVA, Part 2: Using Analysis of Variance to Improve Sample Preparation in Analytical Chemistry

In this second article of a 2-part series on the official JMP blog, I use analysis of variance (ANOVA) to assess a sample-preparation scheme for quantifying sodium in potato chips.  I illustrate the use of the “Fit Y by X” platform in JMP to implement ANOVA, and I propose an alternative sample-preparation scheme to obtain a sample with a smaller variance.  This article is entitled “Potato Chips and ANOVA, Part 2: Using Analysis of Variance to Improve Sample Preparation in Analytical Chemistry“.

If you haven’t read my first blog post in this series on preparing the data in JMP and using the “Stack Columns” function to transpose data from wide format to long format, check it out!  I presented this topic at the last Vancouver SAS User Group (VanSUG) meeting on Wednesday, November 4, 2015.

My thanks to Arati Mejdal, Louis Valente, and Mark Bailey at JMP for their guidance in writing this 2-part series!  It is a pleasure to be a guest blogger for JMP!

 

potato-chips-and-analytical-chemistry-part-2

Potato Chips and ANOVA in Analytical Chemistry – Part 1: Formatting Data in JMP

I am very excited to write again for the official JMP blog as a guest blogger!  Today, the first article of a 2-part series has been published, and it is called “Potato Chips and ANOVA in Analytical Chemistry – Part 1: Formatting Data in JMP“.  This series of blog posts will talk about analysis of variance (ANOVA), sampling, and analytical chemistry, and it uses the quantification of sodium in potato chips as an example to illustrate these concepts.

The first part of this series discusses how to import the data into the JMP and prepare them for ANOVA.  Specifically, it illustrates how the “Stack Columns” function is used to transpose the data from wide format to long format.

I will present this at the Vancouver SAS User Group (VanSUG) meeting later today.

Stay tuned for “Part 2: Using Analysis of Variance to Improve Sample Preparation in Analytical Chemistry“!

 

potato-chips-and-analytical-chemistry-part-1