My new job as the Digital Marketing Analyst at Environics Analytics

As I approach my second anniversary of working at Environics Analytics, I am excited to accept a job offer to become our Digital Marketing Analyst.  In this new role, I am developing strategies to establish my company’s brand and promote our products and services on social media.  I am also using statistics to assess the effectiveness of our marketing efforts, both online and offline.

As The Chemical Statistician, I have written extensively on this blog, produced video tutorials on my YouTube channel, hosted a talk show (The Central Equilibrium), and shared my interests on Twitter (@chemstateric).  Mirroring these efforts in my new job, I will write articles on our company’s blog, produce YouTube videos, interview our staff, and engage with clients on Twitter (@EricCaiEA) and LinkedIn.

Eric sitting under EA logo

I am grateful to work with some wonderful colleagues who are friendly, helpful, and dedicated in their work.  It has been a pleasure to contribute to such a collaborative and joyful atmosphere, and I look forward to making a big impact with my new responsibilities!

New Job as Data Science Consultant at Environics Analytics!

I am very excited to start a new job as a Data Science Consultant at Environics Analytics (EA)!  My new position is a dual role of data scientist and consultant – I will meet clients regularly to advise them on statistical modelling, data analysis, and marketing analytics, and I will conduct statistical research on a variety of problems.  I look forward to learning about and working in EA’s specialty of geodemography, as well as researching new areas of data analytics, such as text mining and sentiment analysis.

I began my new job in mid-August, and I have enjoyed meeting my new co-workers and serving my first client so far.  I have opened a second Twitter feed, @EricCaiEA, to share my work at my new company.  You can continue to find me on my own Twitter feed for The Chemical Statistician, @chemstateric.

eric-cai-in-front-of-ea-logo

New Job at the Bank of Montreal in Toronto

I have accepted an offer from the Bank of Montreal to become a Manager of Operational Risk Analytics and Modelling at its corporate headquarter office in Toronto.  Thus, I have resigned from my job at the British Columbia Cancer Agency.  I will leave Vancouver at the end of December, 2015, and start my new job at the beginning of January, 2016.

I have learned some valuable skills and met some great people here in Vancouver over the past 2 years.  My R programming skills have improved a lot, especially in text processing.  My SAS programming skills have improved a lot, and I began a new section on my blog to SAS programming as a result of what I learned.  I volunteered and delivered presentations for the Vancouver SAS User Group (VanSUG) – once on statistical genetics, and another on sampling strategies in analytical chemistry, ANOVA, and PROC TRANSPOSE.  I have thoroughly enjoyed meeting some smart and helpful people at the Data Science, Machine Learning, and R Programming Meetups.

I lived in Toronto from 2011 to 2013 while pursuing my Master’s degree in statistics at the  University of Toronto and working as a statistician at Predicum.  I look forward to re-connecting with my colleagues there.

New Job as Biostatistical Analyst at the British Columbia Cancer Agency

Dear Readers and Followers of The Chemical Statistician:

My apologies for the slower than usual posting frequency in the last few months, but I have been very busy preparing for a big transition – after a long and intense selection process that started in March, I was offered a new job as a biostatistical analyst at the British Columbia Cancer Agency (BCCA)!

Eric Cai - Official Head Shot

I was sad to leave many of the kind and friendly co-workers whom I met at the British Columbia Centre for Excellence in HIV/AIDS during my 10 months of working there, but I was very excited to accept this offer and begin working for the BCCA – specifically, in the Cancer Surveillance and Outomces (CSO) Unit.  I had already met several of my new co-workers from past meetings in the Vancouver SAS User Group, and I also know 2 people who worked for long periods in this same group in the past.  From all of these interactions, I got a very positive impression about the professionalism, expertise, and collegiality of this new group, so I was delighted to join this team.

I started my new job 3 weeks ago, and was plunged into 3 projects immediately.  I have been swamped with work right from the start, so I’m still adjusting to my new schedule and surroundings.  Nonetheless, I hope to resume blogging at my usual pace as I settle into my new job.  (I just posted a new video on calculating expected counts in contingency tables using joint and marginal probabilities.)  I also hope to use my work as inspiration for blogging topics here at The Chemical Statistician.

Thank you all for your patience and continued readership.  It has been a pleasure to learn from you, and I hope to continue a successful expansion of The Chemical Statistician for the rest of 2014 and beyond!

Eric

 

A New Job at the British Columbia Centre for Excellence in HIV/AIDS

Dear Readers of The Chemical Statistician,

You may have noticed that I have blogged less frequently in the past few months; this has been due to a major change in my career: I recently accepted a new job in the Laboratory Program at the British Columbia Centre for Excellence (BC-CFE) in HIV/AIDS in Vancouver!

Eric Cai - Official Head Shot

A bioinformatician who works in this group recommended me for this position to his supervisors during this past summer.  Having lived in Vancouver before, I have heard a lot about the work that the BC-CFE in HIV/AIDS has done for many years to improve the lives of HIV and AIDS patients and prevent HIV transmission.

Read more of this post

Forgot a new co-worker’s name? This could be an opportunity to establish a positive relationship.

Meeting new people is a constant part of my life, whether it is through new jobs, social events, or networking events.  The first task in establishing rapport with a new acquaintance is to learn their name, yet I sometimes forget it after our first conversation.

shake hands.jpeg

Image courtesy of rawpixel.com on Pexels.

Forgetting new names is very common and forgivable, especially if you are meeting many new people at once.  However, I notice that most people are afraid to admit this.  Perhaps they are embarrassed or worried that their new acquaintances will feel offended.  Thus, they often greet them many times without referencing their name, and this could continue for days, weeks, or even months!

Read more of this post

Sorting correlation coefficients by their magnitudes in a SAS macro

Theoretical Background

Many statisticians and data scientists use the correlation coefficient to study the relationship between 2 variables.  For 2 random variables, X and Y, the correlation coefficient between them is defined as their covariance scaled by the product of their standard deviations.  Algebraically, this can be expressed as

\rho_{X, Y} = \frac{Cov(X, Y)}{\sigma_X \sigma_Y} = \frac{E[(X - \mu_X)(Y - \mu_Y)]}{\sigma_X \sigma_Y}.

In real life, you can never know what the true correlation coefficient is, but you can estimate it from data.  The most common estimator for \rho is the Pearson correlation coefficient, which is defined as the sample covariance between X and Y divided by the product of their sample standard deviations.  Since there is a common factor of

\frac{1}{n - 1}

in the numerator and the denominator, they cancel out each other, so the formula simplifies to

r_P = \frac{\sum_{i = 1}^{n}(x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\sum_{i = 1}^{n}(x_i - \bar{x})^2 \sum_{i = 1}^{n}(y_i - \bar{y})^2}} .

 

In predictive modelling, you may want to find the covariates that are most correlated with the response variable before building a regression model.  You can do this by

  1. computing the correlation coefficients
  2. obtaining their absolute values
  3. sorting them by their absolute values.

Read more of this post

Extracting the Postal Codes from Addresses of Hospitals in British Columbia – An Exercise in SAS Text Processing

Introduction

In my job as a Biostatistical Analyst at the British Columbia (BC) Cancer Agency in Vancouver, I recently needed to get the postal codes for the hospitals in BC.  I found a data table of the hospitals with their addresses, but I needed to extract the postal codes from the addresses.  In this tutorial, I will show you some text processing techniques in SAS that I used to extract the postal codes from that raw data file.

* This blog post contains information licensed under the Open Government License – British Columbia.

Read the rest of this post to get the SAS code for extracting the postal codes and the final spreadsheet that contains the postal codes of the hospitals in British Columbia!

Read more of this post

Useful Functions in R for Manipulating Text Data

Introduction

In my current job, I study HIV at the genetic and biochemical levels.  Thus, I often work with data involving the sequences of nucleotides or amino acids of various patient samples of HIV, and this type of work involves a lot of manipulating text.  (Strictly speaking, I analyze sequences of nucleotides from DNA that are reverse-transcribed from the HIV’s RNA.)  In this post, I describe some common functions in R that I often use for text processing.

Obtaining Basic Information about Character Variables

In R, I often work with text data in the form of character variables.  To check if a variable is a character variable, use the is.character() function.

> year = 2014
> is.character(year)
[1] FALSE

If a variable is not a character variable, you can convert it to a character variable using the as.character() function.

> year.char = as.character(year)
> is.character(year.char)
[1] TRUE

A basic piece of information about a character variable is the number of characters that exist in this string.  Use the nchar() function to obtain this information.

> nchar(year.char)
[1] 4

Read more of this post

How to Find a Job in Statistics – Advice for Students and Recent Graduates

Introduction

A graduate student in statistics recently asked me for advice on how to find a job in our industry.  I’m happy to share my advice about this, and I hope that my advice can help you to find a satisfying job and develop an enjoyable career.  My perspectives would be most useful to students and recent graduates because of my similar but unique background; I graduated only 1.5 years ago from my Master’s degree in statistics at the University of Toronto, and I volunteered as a career advisor at Simon Fraser University during my Bachelor’s degree.  My advice will reflect my experience in finding a job in Toronto, but you can probably find parallels in your own city.

Most of this post focuses on soft skills that are needed to find any job; I dive specifically into advice for statisticians in the last section.  Although the soft skills are general and not specific to statisticians, many employers, veteran statisticians, and professors have told me that students and recent graduates would benefit from the focus on soft skills.  Thus, I discuss them first and leave the statistics-specific advice till the end.

Read more of this post

About Eric

Eric Cai - Official Head Shot

Welcome to The Chemical Statistician!  My name is Eric Cai, and I share my knowledge and passion about statistics, chemistry, math, and career development on this blog.  My main interests in statistics are machine learning, statistical computing, applied statistics, and mathematical statistics.  My main interests in chemistry are physical chemistry, analytical chemistry, nuclear chemistry, environmental chemistry, and inorganic chemistry.  I love to learn new concepts, solve interesting problems, and write easily understandable and usable code for statistical analyses and modelling.  I have extensive experience in programming in Python, R, MATLAB, SAS and SQL.  My passion for career development stems from my volunteering experience as a career advisor during my undergraduate degree, and I continue to share career advice in seminars at universities and professional conferences.

Occupationally, I am a statistician who specializes in data mining and machine learning.  I have worked in consumer packaged goods (CPG), marketing analytics, banking, medical research, industrial statistics, applied mathematics, market analysis, technology commercialization, and environmental chemistry. I am excited to help any organization that values my skills in statistical consulting, computer programming, public speaking, scientific research, writing and teaching.  I welcome any opportunity to use my technical proficiency and excellent interpersonal skills, and I hope to help organizations that value knowledge in statistics or chemistry and want to promote their services with social media, especially blogging.

Please share your insights and wisdom about statistics and chemistry with me on this blog and via Twitter @chemstateric!  This page contains more information on how best to contact me.

Work Experience

I work as a Senior Data Scientist on the Advanced Analytics team at Acosta. I use advanced methods in statistics and mathematics to measure the impact of merchandising and retail services for manufacturers of consumer packaged goods (CPG). The 4 main components of my job are:

– methodological research and execution in measurement analytics
– business consultation for internal stakeholders and external clients
– cross-functional project management with product managers, database administrators, and technology providers
– product development for a proprietary software – providing consultation to the design and user-experience teams

For statistical analysis, I use Python (Pandas, NumPy, Matplotlib, and Seaborn) in Jupyter Labs. For data extraction from Acosta’s internal data platform, I use Python, SQL, and PySpark in Databricks.

My past work experiences in statistics include data science consulting and digital marketing at at Environics Analytics, operational risk analytics and modelling at the Bank of Montreal, cancer surveillance at the British Columbia Cancer Agency, HIV research at the British Columbia Centre for Excellence in HIV/AIDS and industrial consultation and research at Predictum, a company that provides services in statistical consulting, statistical software development, and database management for a variety of industries.

My jobs have allowed me to specialize in machine learning, applied statistics, and medical statistics, and they inspire some of the topics in this blog.  These roles helped me to develop skills in methodological research, software development, client consultation/education, social-media marketing, brand management, data manipulation, data visualization, and data analysis.

My career in statistics began in Toronto in 2011, and I worked in Vancouver from 2013 to 2015.  I returned to Toronto in 2016 to begin a new job at the Bank of Montreal, and I subsequently joined Environics Analytics in August, 2016.  Before my career in statistics, I worked in applied mathematics, market analysis, media relations, university-industry technological commercialization, environmental chemistry, cardiac physiology, evolutionary biology, and bee biology.  I also volunteered as a learning and writing skills counsellor for 7 years and as a career advisor for 6 years.  I have tutored students in math, statistics, and chemistry since high school, and I also taught intermediate mathematical statistics courses to university students as a teaching assistant.

I have spoken often at industrial and professional seminars and conferences about statistics, and I aim to broaden my scope of public speaking to chemistry.  If you are part of an organization that works or is involved in statistics, analytics, data analysis, quantitative research, or chemistry, I would be happy to speak at your upcoming seminar or conference to tell you about my work or share my knowledge and passion about statistics and chemistry.  Please feel free to contact me, and I will respond to you as soon as I can!

I am the sole author of this blog, and it is not affiliated with any of my current or previous employers in any way.

Education

I earned my Bachelor’s degree with a major in chemistry and a minor in math from Simon Fraser University, and I earned my Master’s degree in statistics from the University of Toronto.  I began this blog to reconnect with my intellectual roots, strengthen my grasp of the fundamentals of these subjects, and explore new frontiers in these exciting fields.

Volunteer Service

I have been dedicated to giving back to the community since adolescence.  Before coming back to Toronto, I was an executive member of the Vancouver SAS User Group (VanSUG), which organizes free and open meetings twice per year to allow users of SAS products to share their knowledge with each other.

During the inception of my statistics career, I was an active volunteer in the statistical community in Toronto.  I spoke regularly at industrial seminars on statistics and analytics, where I share my passion and knowledge about machine learning, data mining and statistical computing.  For the Southern Ontario Regional Association (SORA) of the Statistical Society of Canada (SSC), I organized a seminar series on business analytics in Toronto; you can also find out about its upcoming seminars on its LinkedIn group page.  As well, I coordinated the outreach and educational efforts of SORA to high schools and universities in southern Ontario.  I also organized regular seminars on biostatistics with the Toronto Applied Biostatistics Association (TABA).

During my undergraduate studies, I volunteered extensively at Simon Fraser University.  I was a Learning Skills Counsellor at the Student Learning Commons, and I was a Career Advisor at the Career Services Centre.  I have continued to contribute to the Career Services Centre by writing as a guest blogger on its Career Services Informer, a blog about career development.  You are welcomed to contact me and ask for advice about both technical and professional aspects of working in statistics, business and science; I may post it here or at the Career Services Informer.

Please view my professional highlights and LinkedIn profile to learn more about my professional accomplishments, experiences, and skills.