## How to Ask for Reference Letters From Your Professors

This following article was published on the Career Services Informer (CSI), the official career blog of Simon Fraser University (SFU).  I have been fortunate to be a guest blogger for the CSI since I was an undergraduate student at SFU, and you can read all of my recent articles as an alumnus here.

Image courtesy of Frank C. Müller on Wikimedia

I recently blogged about fast-approaching deadlines for professional programs and graduate studies. Applying to those programs and scholarships requires reference letters from professors, and – having done so as a student at SFU – I have learned that this task is far more intense than simply sending a quick email. Here are some tips for how to make it easier for your professors to write the best reference letters for you.

## Vancouver SAS User Group Meeting – Wednesday, November 4, 2015

I am excited to present at the next Vancouver SAS User Group (VanSUG) meeting on Wednesday, November 4, 2015.  I will illustrate data transposition and ANOVA in SAS and JMP using potato chips and analytical chemistry.  Come and check it out!  The following agenda contains all of the presentations, and you can register for this meeting on the SAS Canada web site.  This meeting is free, and a free breakfast will be served in the morning.

Date: Wednesday, November 4, 2015

Place:

Ballroom West and Centre

Holiday Inn – Vancouver Centre

V5Z 3Y2

(604) 879-0511

Agenda:

8:30am – 9:00am: Registration

9:00am – 9:20am: Introductions and SAS Update – Matt Malczewski, SAS Canada

9:20am – 9:40am: Lessons On Transposing Data, Sampling & ANOVA in SAS & JMP – Eric Cai, Cancer Surveillance & Outcomes, BC Cancer Agency

10:20am – 10:30am: A Beginner’s Experience Using SAS – Kim Burrus, Cancer Surveillance & Outcomes, BC Cancer Agency

10:30am – 11:00am: Networking Break

11:00am – 11.20am: Using SAS for Simple Calculations – Jay Shurgold, Rick Hansen Institute

11:20am – 11:50am: Yes, We Can… Save SAS Formats – John Ladds, Statistics Canada

11:50am – 12:20pm: Reducing Customer Attrition with Predictive Analytics – Nate Derby, Stakana Analytics

12:20pm – 12:30pm: Evaluations, Prize Draw & Closing Remarks

If you would like to be notified of upcoming SAS User Group Meetings in Vancouver, please subscribe to the Vancouver SAS User Group Distribution List.

## SFU Statistics and Actuarial Science Gala – Wednesday, September 16, 2015

I look forward to attending the #SFU50 Gala at the Department of Statistics and Actuarial Science at Simon Fraser University on Wednesday, September 16, 2015.  There will be a poster presentation of undergraduate case studies, a short awards ceremony, and many opportunities to network with current and former students, professors and staff from that department.  If you will attend this event, please come and say “Hello”!

Time: 5:00 – 7:30 pm

Date: Wednesday, September 16, 2015

Place: Applied Sciences Building Atrium, Simon Fraser University, Burnaby, British Columbia, Canada

## Vancouver Python Day @ Mobify Vancouver – Saturday, September 12, 2015

I am excited to attend Vancouver Python Day on Saturday, September 12, 2015, at Mobify.  Learn about algorithmic trading, the Python Data Toolkit, using Python on mobile devices, and more! The conference is free to attend.  If you will go to this conference, then please feel free to come and say “Hello”!

Vancouver Python Day
Saturday September 12, 2015
9:30AM – 4:00PM

Mobify Vancouver
#300 – 948 Homer St, Vancouver, BC

Scheduled Presentations

Keynote: The State of Mobile Python
Russell Keith-Magee, Django Software Foundation

Socializing and Networking for Awkward Humans
Carly Slater

Spreading Python Skills to the Scientific Community
Bill Mills, Mozilla Science Lab

Using the Python Data Toolkit: A Live Demo!
Tiffany Timbers

Interesting New Features in Python 3.5
Brett Cannon, Microsoft

Simon Thornington

See the agenda for the full schedule.

Lightning Talks

Time will be provided for Python and Django lightning talks. Sign up will be on site.

This following article was published on the Career Services Informer (CSI), the official career blog of Simon Fraser University (SFU).  I have been fortunate to be a guest blogger for the CSI since I was an undergraduate student at SFU, and you can read all of my recent articles as an alumnus here.

As most students return to school in the upcoming semester, their academic studies and back-to-school logistics may be their top priorities.   However, if you want to pursue graduate studies or professional programs like medicine or law, then there are some important deadlines that are fast approaching, and they all involve time-consuming efforts to meet them. Now is a good time to tackle these deadlines and put forth your best effort while you are free of the burdens of exams and papers that await you later in the fall semester.

Image Courtesy of Melburnian at Wikimedia

Speaking from experience, these applications are very long and tiring, and they will take a lot of thought, planning, writing and re-writing. They also require a lot of coordination to get the necessary documents, like your transcripts and letters of recommendation from professors who can attest to your academic accomplishments and research potential.  Plan ahead for them accordingly, and consider using the Career Services Centre to help you with drafting your curriculum vitae, your statements of interest, and any interview preparation.

## Odds and Probability: Commonly Misused Terms in Statistics – An Illustrative Example in Baseball

Yesterday, all 15 home teams in Major League Baseball won on the same day – the first such occurrence in history.  CTV News published an article written by Mike Fitzpatrick from The Associated Press that reported on this event.  The article states, “Viewing every game as a 50-50 proposition independent of all others, STATS figured the odds of a home sweep on a night with a full major league schedule was 1 in 32,768.”  (Emphases added)

Screenshot captured at 5:35 pm Vancouver time on Wednesday, August 12, 2015.

Out of curiosity, I wanted to reproduce this result.  This event is an intersection of 15 independent Bernoulli random variables, all with the probability of the home team winning being 0.5.

$P[(\text{Winner}_1 = \text{Home Team}_1) \cap (\text{Winner}_2 = \text{Home Team}_2) \cap \ldots \cap (\text{Winner}_{15}= \text{Home Team}_{15})]$

Since all 15 games are assumed to be mutually independent, the probability of all 15 home teams winning is just

$P(\text{All 15 Home Teams Win}) = \prod_{n = 1}^{15} P(\text{Winner}_i = \text{Home Team}_i)$

$P(\text{All 15 Home Teams Win}) = 0.5^{15} = 0.00003051757$

Now, let’s connect this probability to odds.

It is important to note that

• odds is only applicable to Bernoulli random variables (i.e. binary events)
• odds is the ratio of the probability of success to the probability of failure

For our example,

$\text{Odds}(\text{All 15 Home Teams Win}) = P(\text{All 15 Home Teams Win}) \ \div \ P(\text{At least 1 Home Team Loses})$

$\text{Odds}(\text{All 15 Home Teams Win}) = 0.00003051757 \div (1 - 0.00003051757)$

$\text{Odds}(\text{All 15 Home Teams Win}) = 0.0000305185$

The above article states that the odds is 1 in 32,768.  The fraction 1/32768 is equal to 0.00003051757, which is NOT the odds as I just calculated.  Instead, 0.00003051757 is the probability of all 15 home teams winning.  Thus, the article incorrectly states 0.00003051757 as the odds rather than the probability.

This is an example of a common confusion between probability and odds that the media and the general public often make.  Probability and odds are two different concepts and are calculated differently, and my calculations above illustrate their differences.  Thus, exercise caution when reading statements about probability and odds, and make sure that the communicator of such statements knows exactly how they are calculated and which one is more applicable.

## Analytical Chemistry Lesson of the Day – Linearity in Method Validation

In analytical chemistry, the quantity of interest is often estimated from a calibration line.  A technique or instrument generates the analytical response for the quantity of interest, so a calibration line is constructed from generating multiple responses from multiple standard samples of known quantities.  Linearity refers to how well a plot of the analytical response versus the quantity of interest follows a straight line.  If this relationship holds, then an analytical response can be generated from a sample containing an unknown quantity, and the calibration line can be used to estimate the unknown quantity with a confidence interval.

Note that this concept of “linear” is different from the “linear” in “linear regression” in statistics.

This is the the second blog post in a series of Chemistry Lessons of the Day on method validation in analytical chemistry.  Read the previous post on specificity, and stay tuned for future posts!

## Mathematical Statistics Lesson of the Day – Basu’s Theorem

Today’s Statistics Lesson of the Day will discuss Basu’s theorem, which connects the previously discussed concepts of minimally sufficient statistics, complete statistics and ancillary statistics.  As before, I will begin with the following set-up.

Suppose that you collected data

$\mathbf{X} = X_1, X_2, ..., X_n$

in order to estimate a parameter $\theta$.  Let $f_\theta(x)$ be the probability density function (PDF) or probability mass function (PMF) for $X_1, X_2, ..., X_n$.

Let

$t = T(\mathbf{X})$

be a statistics based on $\textbf{X}$.

Basu’s theorem states that, if $T(\textbf{X})$ is a complete and minimal sufficient statistic, then $T(\textbf{X})$ is independent of every ancillary statistic.

Establishing the independence between 2 random variables can be very difficult if their joint distribution is hard to obtain.  This theorem allows the independence between minimally sufficient statistic and every ancillary statistic to be established without their joint distribution – and this is the great utility of Basu’s theorem.

However, establishing that a statistic is complete can be a difficult task.  In a later lesson, I will discuss another theorem that will make this task easier for certain cases.

## Analytical Chemistry Lesson of the Day – Specificity in Method Validation and Quality Assurance

In pharmaceutical chemistry, one of the requirements for method validation is specificity, the ability of an analytical method to distinguish the analyte from other chemicals in the sample.  The specificity of the method may be assessed by deliberately adding impurities into a sample containing the analyte and testing how well the method can identify the analyte.

Statistics is an important tool in analytical chemistry, and, ideally, there is no overlap in the vocabulary that is used between the 2 fields.  Unfortunately, the above definition of specificity is different from that in statistics.  In a previous Machine Learning Lesson and Applied Statistics Lesson of the Day, I introduced the concepts of sensitivity and specificity in binary classification.  In the context of assessing the predictive accuracy of a binary classifier, its specificity is the proportion of truly negative cases among the classified negative cases.

## Mathematical Statistics Lesson of the Day – An Example of An Ancillary Statistic

Consider 2 random variables, $X_1$ and $X_2$, from the normal distribution $\text{Normal}(\mu, \sigma^2)$, where $\mu$ is unknown.  Then the statistic

$D = X_1 - X_2$

has the distribution

$\text{Normal}(0, 2\sigma^2)$.

The distribution of $D$ does not depend on $\mu$, so $D$ is an ancillary statistic for $\mu$.

Note that, if $\sigma^2$ is unknown, then $D$ is not ancillary for $\sigma^2$.

## Physical Chemistry Lesson of the Day – What is the Primary Determinant of the Effective Nuclear Charge for Outer Electrons?

Electrons in the inner shells of an atom shield the electrons in the outer shells pretty well from the nuclear charge.  However, electrons in the same shell don’t shield each other very well.  If an electron spends most of its time below another electron, then the first electron can shield the second electron.  However, this is not the case for electrons in the same shell – they repel each other because they are all negatively charged, and they are at roughly the same average distance from the nucleus.

Thus, the difference between

1. the charge of the nucleus
2. and the charge of the core electrons

is the primary contributor to the effective nuclear charge that the outer electrons experience.

## Data Science Seminar by David Campbell on Approximate Bayesian Computation and the Earthworm Invasion in Canada

My colleague, David Campbell, will be the feature speaker at the next Vancouver Data Science Meetup on Thursday, June 25.  (This is a jointly organized event with the Vancouver Machine Learning Meetup and the Vancouver R Users Meetup.)  He will present his research on approximate Bayesian computation and Markov Chain Monte Carlo, and he will highlight how he has used these tools to study the invasion of European earthworms in Canada, especially their drastic effects on the boreal forests in Alberta.

Dave is a statistics professor at Simon Fraser University, and I have found him to be very smart and articulate in my communication with him.  This seminar promises to be both entertaining and educational.  If you will attend it, then I look forward to seeing you there!  Check out Dave on Twitter and LInkedIn.

Title: The great Canadian worm invasion (from an approximate Bayesian computation perspective)

Speaker: David Campbell

Date: Thursday, June 25

Place:

5 East 8th Avenue

Vancouver, BC

Schedule:

• 6:00 pm: Doors are open – feel free to mingle!
• 6:30 pm: Presentation begins.
• ~7:45 Off to a nearby restaurant for food, drinks, and breakout discussions.

Abstract:

After being brought in by pioneers for agricultural reasons, European earthworms have been taking North America by storm and are starting to change the Alberta Boreal forests. This talk uses an invasive species model to introduce the basic ideas behind estimating the rate of new worm introductions and how quickly they spread with the goal of predicting the future extent of the great Canadian worm invasion. To take on the earthworm invaders, we turn to Approximate Bayesian Computation methods. Bayesian statistics are used to gather and update knowledge as new information becomes available owing to their success in prediction and estimating ongoing and evolving processes. Approximate Bayesian Computation is a step in the right direction when it’s just not possible to actually do the right thing- in this case using the exact invasive species model is infeasible. These tools will be used within a Markov Chain Monte Carlo framework.

Dave Campbell is an Associate Professor in the Department of Statistics and Actuarial Science at Simon Fraser University and Director of the Management and Systems Science Program. Dave’s main research area is at the intersections of statistics with computer science, applied math, and numerical analysis. Dave has published papers on Bayesian algorithms, adaptive time-frequency estimation, and dealing with lack of identifiability. His students have gone on to faculty positions and worked in industry at video game companies and predicting behaviour in malls, chat rooms, and online sales.

## Mathematical Statistics Lesson of the Day – Ancillary Statistics

The set-up for today’s post mirrors my earlier Statistics Lessons of the Day on sufficient statistics and complete statistics.

Suppose that you collected data

$\mathbf{X} = X_1, X_2, ..., X_n$

in order to estimate a parameter $\theta$.  Let $f_\theta(x)$ be the probability density function (PDF) or probability mass function (PMF) for $X_1, X_2, ..., X_n$.

Let

$a = A(\mathbf{X})$

be a statistics based on $\textbf{X}$.

If the distribution of $A(\textbf{X})$ does NOT depend on $\theta$, then $A(\textbf{X})$ is called an ancillary statistic.

An ancillary statistic contains no information about $\theta$; its distribution is fixed and known without any relation to $\theta$.  Why, then, would we care about $A(\textbf{X})$  I will address this question in later Statistics Lessons of the Day, and I will connect ancillary statistics to sufficient statistics, minimally sufficient statistics and complete statistics.

## Analytical Chemistry Lesson of the Day – Method Validation in Quality Assurance

When developing any method in analytical chemistry, it must meet several criteria to ensure that it accomplishes its intended objective at or above an acceptable standard.  This process is called method validation, and it has the following criteria* in the pharmaceutical industry

• specificity
• linearity
• accuracy
• precision
• range
• limit of detection
• limit of quantitation
• robustness**

As I will note in future Chemistry Lessons of the Day, these words are used differently between statistics and chemistry.

*These criteria are taken from Page 723 of the 6th edition of “Quantitative Chemical Analysis” by Daniel C. Harris (2003).

**The Food and Drug Administration does not list robustness as a typical characteristic of method validation.  (See Section B on Page 7 of its “Guidance for Industry Analytical Procedures and Methods Validation for Drugs and Biologics“.)  However, it does mention robustness several times as an important characteristic that “should be evaluated” during the “early stages of method development”.

## Mathematics and Applied Statistics Lesson of the Day – Contrasts

A contrast is a linear combination of a set of variables such that the sum of the coefficients is equal to zero.  Notationally, consider a set of variables

$\mu_1, \mu_2, ..., \mu_n$.

Then the linear combination

$c_1 \mu_1 + c_2 \mu_2 + ... + c_n \mu_n$

is a contrast if

$c_1 + c_2 + ... + c_n = 0$.

There is a reason for why I chose to use $\mu$ as the symbol for the variables in the above notation – in statistics, contrasts provide a very useful framework for comparing multiple population means in hypothesis testing.  In a later Statistics Lesson of the Day, I will illustrate some examples of contrasts, especially in the context of experimental design.

## Leaving My Dream Career – Reflecting on My Decision 10 Years Later

I just couldn’t pretend any longer.

It was near the end of my second year at Simon Fraser University.  My GPA was pretty high, and I had just won a competitive NSERC Undergraduate Student Research Award to work with an accomplished cardiac physiologist.  I attended all of the relevant seminars to get the “inside scoop” on how to successfully apply to medical school, and I volunteered in numerous organizations to demonstrate my non-academic credentials.  I had already developed good relationships with several professors who would have gladly written strong recommendations for my application.  All of the stars were aligning for my path to medical school.

I was also miserable, angry and devoid of any further motivation to stay on that path.

Image courtesy of Carsten Tolkmit from Flickr.  Obtained via the Creative Commons License.

## Organic Chemistry Lesson of the Day – The 4 Conformational Isomers of Butane

In a previous Chemistry Lesson of the Day, I introduced the simplest case of conformational isomerism – the staggered and eclipsed conformations of ethane.  The next most complicated case of conformational isomerism belongs to butane.  Here are the Newman’s projections of the 4 possibilities.

Modified image courtesy of Avitek from Wikimedia.

The conformational isomers are named with respect to the proximity of the 2 methyl groups.  The dihedral angle between the 2 methyl groups, θ, is below each Newman projection.  From left to right, the conformational isomers are:

• fully eclipsed (θ = 0 degrees)
• gauche (θ = 60 degrees)
• eclipsed (θ = 120 degrees)
• anti (θ = 180 degrees)

Clearly, the fully eclipsed conformation has the most steric strain* between the 2 methyl groups, so its internal energy is highest.

Clearly, the anti conformation has the lowest steric strain between the 2 methyl groups, so its internal energy is lowest.

The gauche conformation has less steric strain than the eclipsed conformation, so its internal energy is the lower of the two conformations.

From lowest to highest internal energy, here is the ranking of the conformation isomers:

1. anti
2. gauche
3. eclipsed
4. fully eclipsed

This can be visualized by the following energy diagram.

Image courtesy of Mr.Holmium from Wikimedia.

*As mentioned in my previous Chemistry Lesson of the Day on the 2 conformational isomers of ethane, there is some controversy about what really causes the internal energy to increase in eclipsed conformations.  Some chemists suggest that hyperconjugation is responsible.

## How to Extract a String Between 2 Characters in R and SAS

#### Introduction

I recently needed to work with date values that look like this:

 mydate Jan 23/2 Aug 5/20 Dec 17/2

I wanted to extract the day, and the obvious strategy is to extract the text between the space and the slash.  I needed to think about how to program this carefully in both R and SAS, because

1. the length of the day could be 1 or 2 characters long
2. I needed a code that adapted to this varying length from observation to observation
3. there is no function in either language that is suited exactly for this purpose.

In this tutorial, I will show you how to do this in both R and SAS.  I will write a function in R and a macro program in SAS to do so, and you can use the function and the macro program as you please!

## Eric’s Enlightenment for Friday, June 5, 2015

1. Christian Robert provides a gentle introduction to the Metropolis-Hastings algorithm with accompanying R codes.  (Hat Tip: David Campbell)
2. John Sall demonstrates how to perform discriminant analysis in JMP, especially for data sets with many variables.
3. Using machine learning instead of human judgment may improve the selection of job candidates.  This article also includes an excerpt from a New York Times article about how the Milwaukee Bucks used facial recognition as one justification to choose Jabari Parker over Dante Exum.  (Hat Tip: Tyler Cowen)
4. “A hospital at the University of California San Francisco Medical Center has a robot filling prescriptions.”

## Eric’s Enlightenment for Thursday, June 4, 2015

1. IBM explains how Watson the computer answered the Final Jeopardy question against Ken Jennings and Brad Rutter.  (In a question about American airports, Watson’s answer was “What is Toronto???”  It’s not as ridiculous as you think, and Watson didn’t wager a lot of money for this answer – so it still won by a wide margin.)
2. Two views on how to reform FIFA by Nate Silver and  – this is an interesting opportunity to apply good principles of institutional design and political economy.
3. How blind people navigate the Internet.
4. The Replication Network – a web site devoted to the study of replications in economics.
5. Cryptochromes and particularly the molecule flavin adenine dinucleotide (FAD) that forms part of the cryptochrome, are thought to be responsible for magnetoreception, the ability of some animals to navigate in Earth’s magnetic field.  Joshua Beardmore et al. have developed a microscope that can detect the magnetic properties of FAD – some very cool work on radical pair chemistry!