Eric’s Enlightenment for Wednesday, May 20, 2015

  1. A common but bad criticism of basketball analytics is that statistics cannot capture the effect of teamwork when assessing the value of a player.  Dan Rosenbaum wrote a great article on how adjusted plus/minus accomplishes this goal.
  2. Citing Dan’s work above, Neil Paine used adjusted plus/minus (APM) to show why Jason Collins was one of the top defensive centres in the NBA and the most underrated player of the last 15 years of his career.  When Neil mentions regularized APM (RAPM) in the third-to-last paragraph, he calls it a Bayesian version of APM.  Most statisticians are more familiar with the term ridge regression, which is one type of regression that penalizes the inclusion of too many redundant predictors.  Make sure to check out that great plot of actual RAPM vs. expected PER at the bottom of the article.
  3. In a 33-page article that was published on 2015-05-14 in Physical Review Letters, only the first 9 pages describes the research done for the article; the other 24 pages were used to list its 5,514 authors – setting a record for the largest known number of authors for a single research article.  Hyperauthorship is common in physics, but not – apparently – in biology.  (Hat Tip: Tyler Cowen)
  4. Brandon Findlay explains why methanol/water mixtures make great cooling baths.  He wrote a very thorough follow-up blog post on how to make them, and he includes photos to aid the demonstration.

Eric’s Enlightenment for Tuesday, April 28, 2015

  1. On a yearly basis, the production of almonds in California uses more water than businesses and residences in San Francisco and Los Angeles combined.  Alex Tabarrok explains why.
  2. How patient well-being and patient satisfaction become conflicting objectives in hospitals – a case study of a well-intended policy with deadly consequences.  (HT: Frances Woolley – with a thought about academia.)
  3. Contrary to a long-held presumption about the stability of DNA in mature cells, Huimei Yu et al. show that neurons use DNA methylation to rewrite their DNA throughout each day.  This is done to adjust the brain to different activity levels as its function changes over time.
  4. Alex Yakubovitch provides a tutorial on regular expressions (patterns that define sets of strings) and how to use them in R.

Eric’s Enlightenment for Tuesday, April 21, 2015

  1. The standard Gibbs free energy of the conversion of water from a liquid to a gas is positive.  Why does it still evaporate at room temperature?  Very good answer on Chemistry Stack Exchange.
  2. The Difference Between Clustered, Longitudinal, and Repeated Measures Data.  Good blog post by Karen Grace-Martin.
  3. 25 easy and inexpensive ways to clean household appliances using simple (and non-toxic) household products.
  4. A nice person named Alex kindly transcribed the notes for all of Andrew Ng’s video lectures in his course on machine learning at Coursera.

How to Calculate a Partial Correlation Coefficient in R: An Example with Oxidizing Ammonia to Make Nitric Acid


Today, I will talk about the math behind calculating partial correlation and illustrate the computation in R.  The computation uses an example involving the oxidation of ammonia to make nitric acid, and this example comes from a built-in data set in R called stackloss.

I read Pages 234-237 in Section 6.6 of “Discovering Statistics Using R” by Andy Field, Jeremy Miles, and Zoe Field to learn about partial correlation.  They used a data set called “Exam Anxiety.dat” available from their companion web site (look under “6 Correlation”) to illustrate this concept; they calculated the partial correlation coefficient between exam anxiety and revision time while controlling for exam score.  As I discuss further below, the plot between the 2 above residuals helps to illustrate the calculation of partial correlation coefficients.  This plot makes intuitive sense; if you take more time to study for an exam, you tend to have less exam anxiety, so there is a negative correlation between revision time and exam anxiety.

residuals plot anxiety and revision time controlling exam score

They used a function called pcor() in a package called “ggm”; however, I suspect that this package is no longer working properly, because it depends on a deprecated package called “RBGL” (i.e. “RBGL” is no longer available in CRAN).  See this discussion thread for further information.  Thus, I wrote my own R function to illustrate partial correlation.

Partial correlation is the correlation between 2 random variables while holding other variables constant.  To calculate the partial correlation between X and Y while holding Z constant (or controlling for the effect of Z, or averaging out Z),

Read more of this post

How do Dew and Fog Form? Nature at Work with Temperature, Vapour Pressure, and Partial Pressure

In the early morning, especially here in Canada, I often see dew – water droplets formed by the condensation of water vapour on outside surfaces, like windows, car roofs, and leaves of trees.  I also sometimes see fog – water droplets or ice crystals that are suspended in air and often blocking visibility at great distances.  Have you ever wondered how they form?  It turns out that partial pressure, vapour pressure and temperature are the key phenomena at work.

dew fog

Dew (by Staffan Enbom) and Fog (by Jon Zander)

Source: Wikimedia

Read more of this post

Why Does Diabetes Cause Excessive Urination and Thirst? A Lesson on Osmosis

A TABA Seminar on Diabetes

I have the pleasure of being an executive member of the Toronto Applied Biostatistics Association (TABA), a volunteer-run professional organization here in Toronto that organizes seminars on biostatistics.  During this past Tuesday, Dr. Loren Grossman from the LMC Diabetes and Endocrinology Centre generously donated his time to deliver an introductory seminar on diabetes for biostatisticians.  The Institute for Clinical and Evaluative Sciences (ICES) at Sunnybrook Hospital kindly hosted us and provided the venue for the seminar.  As a chemist and a former pre-medical student who studied physiology, I really enjoyed this intellectual treat, especially since Loren was clear, informative, and very knowledgeable about the subject.

blue circle

The blue circle is a global symbol for diabetes.

Source: Wikimedia Commons

Read more of this post


