New Job at the Bank of Montreal in Toronto

I have accepted an offer from the Bank of Montreal to become a Manager of Operational Risk Analytics and Modelling at its corporate headquarter office in Toronto.  Thus, I have resigned from my job at the British Columbia Cancer Agency.  I will leave Vancouver at the end of December, 2015, and start my new job at the beginning of January, 2016.

I have learned some valuable skills and met some great people here in Vancouver over the past 2 years.  My R programming skills have improved a lot, especially in text processing.  My SAS programming skills have improved a lot, and I began a new section on my blog to SAS programming as a result of what I learned.  I volunteered and delivered presentations for the Vancouver SAS User Group (VanSUG) – once on statistical genetics, and another on sampling strategies in analytical chemistry, ANOVA, and PROC TRANSPOSE.  I have thoroughly enjoyed meeting some smart and helpful people at the Data Science, Machine Learning, and R Programming Meetups.

I lived in Toronto from 2011 to 2013 while pursuing my Master’s degree in statistics at the  University of Toronto and working as a statistician at Predicum.  I look forward to re-connecting with my colleagues there.

How to Extract a String Between 2 Characters in R and SAS


I recently needed to work with date values that look like this:

Jan 23/2
Aug 5/20
Dec 17/2

I wanted to extract the day, and the obvious strategy is to extract the text between the space and the slash.  I needed to think about how to program this carefully in both R and SAS, because

  1. the length of the day could be 1 or 2 characters long
  2. I needed a code that adapted to this varying length from observation to observation
  3. there is no function in either language that is suited exactly for this purpose.

In this tutorial, I will show you how to do this in both R and SAS.  I will write a function in R and a macro program in SAS to do so, and you can use the function and the macro program as you please!

Read more of this post

Extracting the Postal Codes from Addresses of Hospitals in British Columbia – An Exercise in SAS Text Processing


In my job as a Biostatistical Analyst at the British Columbia (BC) Cancer Agency in Vancouver, I recently needed to get the postal codes for the hospitals in BC.  I found a data table of the hospitals with their addresses, but I needed to extract the postal codes from the addresses.  In this tutorial, I will show you some text processing techniques in SAS that I used to extract the postal codes from that raw data file.

* This blog post contains information licensed under the Open Government License – British Columbia.

Read the rest of this post to get the SAS code for extracting the postal codes and the final spreadsheet that contains the postal codes of the hospitals in British Columbia!

Read more of this post

Useful Functions in R for Manipulating Text Data


In my current job, I study HIV at the genetic and biochemical levels.  Thus, I often work with data involving the sequences of nucleotides or amino acids of various patient samples of HIV, and this type of work involves a lot of manipulating text.  (Strictly speaking, I analyze sequences of nucleotides from DNA that are reverse-transcribed from the HIV’s RNA.)  In this post, I describe some common functions in R that I often use for text processing.

Obtaining Basic Information about Character Variables

In R, I often work with text data in the form of character variables.  To check if a variable is a character variable, use the is.character() function.

> year = 2014
> is.character(year)

If a variable is not a character variable, you can convert it to a character variable using the as.character() function.

> year.char = as.character(year)
> is.character(year.char)
[1] TRUE

A basic piece of information about a character variable is the number of characters that exist in this string.  Use the nchar() function to obtain this information.

> nchar(year.char)
[1] 4

Read more of this post