Professional Highlights
Scholarships

Gordon M. Shrum Entrance Scholarship
Simon Fraser UniversityI was one of 40 students entering Simon Fraser University to receive the Gordon M. Shrum Entrance Scholarship, recognizing top academic achievement, community service, and leadership potential. This scholarship was worth $24,000.

Master of Science Graduate Funding Guarantee
Department of Statistical Sciences, University of TorontoI completed my Master of Science degree in Statistics at one of the world’s best departments in statistics with a guaranteed funding package worth over $18,000.
Awards

Silver Medal
Canadian Society for ChemistryI received the Canadian Society for Chemistry’s Silver Medal for being the top 4thyear chemistry student at Simon Fraser University among 100120 possible recipients.

Undergraduate Student Research Award (USRA)
Simon Fraser University – Office of the VicePresident of ResearchI was one of 11 undergraduate students who received research awards in the Department of Mathematics at Simon Fraser University in 20102011. This funded my research position in the Interdisciplinary Research in the Mathematical and Computational Sciences (IRMACS) Centre, where I modelled HIV epidemics using ordinary differential equations and Monte Carlo simulations.

Undergraduate Student Research Award (USRA)
National Science and Engineering Research Council (NSERC)I was one of 5 undergraduate students who received research awards in the Department of Biomedical Physiology and Kinesiology at Simon Fraser University in 20052006. This funded my research position in the Cardiac Membrane Research Laboratory, where I used immunocytochemistry and confocal microscopy to study the physiology of cardiac muscle cells and stem cells.

Travel Award for Conference Presentation
Statistical Society of CanadaI received a Travel Award to attend the 2012 Statistical Society of Canada’s Conference to present my statistical computing research on Monte Carlo simulations of power functions of correlation tests. I discovered a pathology in the sampling distributions of these correlation tests when testing on data from the bivariate tdistribution, and I explored the nature of this pathology and how to overcome it.

Webber Prize
Science and Environment Cooperative Education Program – Simon Fraser UniversityI received the Webber Prize for writing best top cooperative education report based on my position as the Technology Transfer Assistant at TRIUMF, Canada’s National Laboratory for Particle and Nuclear Physics. This prize specifically recognized my efforts in writing the 20062007 Annual Report on TRIUMF’s Business Development Plan.

Dean’s List
Faculty of Science, Simon Fraser UniversityI was on the Faculty of Science’s Dean’s List for academic excellence throughout my undergraduate degree at Simon Fraser University, where I studied kinesiology, chemistry, economics and mathematics. I finished my Bachelor’s degree with Distinction with a major in chemistry and a minor in mathematics in 2011.
Publications
 Scott, S.A., van der Zanden, C., Cai, E., McGahan, C.E. and Kwon, J.S., 2017. Prognostic significance of peritoneal cytology in lowintermediate risk endometrial cancer. Gynecologic Oncology.
 McColl, R.J., McGahan, C.E., Cai, E., Olson, R., Cheung, W.Y., Raval, M.J., Phang, P.T., Karimuddin, A.A. and Brown, C.J., 2017. Impact of hospital volume on quality indicators for rectal cancer surgery in British Columbia, Canada. The American Journal of Surgery, 213(2), pp.388394.
 Olson, R.A., Tiwana, M., Barnes, M., Cai, E., McGahan, C., Roden, K., Yurkowski, E., Gentles, Q., French, J., Halperin, R. and Olivotto, I.A., 2016. Impact of using audit data to improve the evidencebased use of singlefraction radiation therapy for bone metastases in British Columbia. International Journal of Radiation Oncology* Biology* Physics, 94(1), pp.4047.
 Lee, G.Q., Lachowski, C., Cai, E., Lima, V.D., Boum, Y., Muzoora, C., Mocello, A.R., Hunt, P.W., Martin, J.N., Bangsberg, D.R. and Harrigan, P.R., 2016. NonR5tropic HIV1 in subtype A1 and D infections were associated with lower pretherapy CD4+ cell count but not with PI/(N) NRTI therapy outcomes in Mbarara, Uganda. AIDS, 30(11), pp.17811788.
 Swenson, L.C., Min, J.E., Woods, C.K., Cai, E., Li, J.Z., Montaner, J.S., Harrigan, P.R. and GonzalezSerna, A., 2014. HIV drug resistance detected during lowlevel viremia is associated with subsequent virologic failure. AIDS (London, England), 28(8), p.1125.
Projects

Pathological Monte Carlo Simulations of Power Functions of Correlation Tests
I began this very fruitful and interesting research project in my graduate statistical computing class, where I originally intended to compare the power functions of the Pearson and the Spearman correlation tests using Monte Carlo simulations. This was specifically done for data from two distributions: the bivariate normal distribution and the bivariate tdistribution. However, I encountered grossly pathological power functions of the Pearson correlation test for data from the bivariate tdistribution. I later found a subtle but surprising pathology for the Spearman correlation test’s power functions. I presented my study of these pathologies and the antidotes to obtain the correct power functions at the Statistical Society of Canada’s annual conference in Guelph in June, 2012. I completed further work on this project after the conference, and I was invited to present the improved results at greater length at the graduate biostatistics weekly seminar at the University of Toronto in October, 2012.

Analyzing Data from the Motivated Strategies for Learning Questionnaire (MSLQ) to Determine the Factors Affecting Performance of FirstYear University Students
Team Members: Eric Cai, Derrick Gray, Craig Burkett
A lecturer in the University of Toronto approached me, Derrick Gray and Craig Burkett to determine the factors for success in a firstyear course using results that he collected from his students with the Motivated Strategies for Learning Questionnaire (MSLQ). His colleague in Kasetsart University in Thailand taught the same course and collected the same data, so we also compared the results between the two cultures.
Derrick, Craig and I analyzed the data in SAS, wrote about our analyses and findings in detailed reports, and presented the highlights of our reports to our client. After the completion of my course, I voluntarily pursued further research with my this client to analyze his data sets with improved methods in machine learning, and I wrote a new report based on my findings. In June, 2013, I presented my findings to the client and concluded our successful collaboration.

Seminar Presentation: Using Advanced Predictive Modelling and Pattern Recognition in Business Analytics
I delivered this 1hour presentation to a diverse group of business analysts, statisticians, academics, students, and executives in a seminar series organized by the Southern Ontario Regional Association (SORA) of the Statistical Society of Canada (SSC). My main goal was to introduce how machine learning and data mining are used in business analytics, especially in overcoming the limitations of traditional statistical techniques. I then described 2 predictive modelling (or supervised learning) techniques – partial least squares regression and bootstrap (random) forest – and illustrated how Predictum succeeded in using these techniques to solve analytical problems for 2 actual clients.

Seminar Presentation: Discriminant Analysis – A Machine Learning Technique for Classification in JMP and SAS
I was invited by the Toronto Area SAS Society (TASS) to deliver this presentation on discriminant analysis and how to implement it in JMP and SAS. Discriminant analysis is a classification techniquea that uses continuous predictor variables to predict categorical target variables. I specifically focused on Gaussian discriminant analysis and gave an nontechnical and intuitive explanation that appealed to a diverse audience of statisticians, analysts, and database managers.

Seminar Presentation: Finding Patterns in Data with KMeans Clustering in JMP and SAS
I was invited by the Toronto Area SAS Society (TASS) to deliver this presentation on Kmeans clustering and how to implement it in JMP and SAS. Kmeans clustering is a common machine learning technique for finding patterns in continuous data and grouping them based on proximity. This presentation was gently mathematical and used many diagrams, plots, and pictures to illustrate the essential ideas of Kmeans clustering for a wide audience of analysts, statisticians, and database managers. It was very well received by the audience, and many nonmathematical analysts praised the accessibility and clarity of this presentation.

Seminar Presentation: Overcoming Multicollinearity and Overfitting – Partial Least Squares Regression in JMP and SAS
I was invited by the Toronto Area SAS Society (TASS) to deliver this presentation on partial least squares regression and how to implement it in JMP and SAS. Partial least squares regression is a powerful predictive modelling and variable selection technique in machine learning, and it is particularly useful for overcoming multicollinearity and overfitting, two common problems encountered by linear or logistic regression. This presentation was very well received by the audience, especially nonmathematicians who found it to be clear and accessible. Matt Malczewski and Chris Battiston, two active members of the SAS Canada Community, recapped my presentations in their respective blogs.

Mathematical Model of Treatment as a Strategy to Reduce HIV Transmission
The World Health Organization published a controversial study in 2009 in support of using highly active antiretroviral therapy (HAART) to prevent HIV transmission with results from two mathematical models – one deterministic and one stochastic – built in Visual Basic. I attempted to replicate these models in MATLAB. The stochastic model without treatment was replicated successfully, but the assumptions of the model were found to be tenuous, and an alternative model was produced with slightly diﬀerent results. The deterministic model could not be replicated, with or without treatment. However, a simpliﬁcation of the model opened the door to 3 diﬀerent ways of examining treatment as prevention. With the given parameters of the population from the WHO, the simpliﬁed model showed that a suﬃciently high treatment rate could eliminate the epidemic.

Mathematical Model of the Cathode Catalyst Layer in Proton Exchange Membrane Fuel Cells
I studied a model of the proton exchange membrane fuel cell’s cathode catalyst layer, where the reduction halfreaction occurs. This model was a system of 3 partial differential equations that were reduced to ordinary differential equations, and I showed, both analytically by rederiving the entire model and numerically using MATLAB, that this model had flawed boundary values and could not be solved. Given the opportunity in the near future, I am eager to continue this project to rectify these flaws and write a MATLAB script that clearly and correctly solves these equations.

LogNormal Regression of LeftCensored Environmental Data: An Example with Trichloroethylene Concentrations in Ground Water
In my graduate applied statistics course, I wrote a paper on leftcensored data and used a sample data set from an environmental chemistry textbook to illustrate how to analyze it. I defined censored data, introduced the difficulties of analyzing them, and discussed methods to overcome those difficulties. Using a parametric approach, I assumed a lognormal distribution for the data and used maximum likelihood estimation to estimate the parameters in my model.

Comparing Nonparametric, Semiparametric, and Parametric Survival Analyses of the Determinants of Surviving AIDS in Quebec: 19791994
Team Members: Eric Cai, Jiafen Gong
In my graduate course on survival analysis, I partnered with Jiafen Gong to analyze a data set of HIV patients in Quebec from 1979 to 1994. Using nonparametric (KaplanMeier), semiparametric (Cox proportional hazards) and parametric (exponential, Weibull, lognormal, loglogistic) approaches, we sought to find the determinants of survival from AIDS. Significant data cleaning was done to ease the analysis, and inconsistencies in the recording of the data were discovered in the process. Despite the difficulties encountered with this data set, we found several interesting results. Younger patients were found to survive longer than older patients. Men who had sex with men and intravenous drug users wre found to survive longer than any other risk category. Patients who were diagnosed after 1989 had higher survival than patients who were diagnosed before 1989. Gender was found to be an insignificant factor for survival, though its effect on survival may have been confounded by the risk categories.
Recent Comments