Mathematical Statistics Lesson of the Day – An Example of An Ancillary Statistic

Consider 2 random variables, X_1 and X_2, from the normal distribution \text{Normal}(\mu, \sigma^2), where \mu is unknown.  Then the statistic

D = X_1 - X_2

has the distribution

\text{Normal}(0, 2\sigma^2).

The distribution of D does not depend on \mu, so D is an ancillary statistic for \mu.

Note that, if \sigma^2 is unknown, then D is not ancillary for \sigma^2.

Physical Chemistry Lesson of the Day – What is the Primary Determinant of the Effective Nuclear Charge for Outer Electrons?

Electrons in the inner shells of an atom shield the electrons in the outer shells pretty well from the nuclear charge.  However, electrons in the same shell don’t shield each other very well.  If an electron spends most of its time below another electron, then the first electron can shield the second electron.  However, this is not the case for electrons in the same shell – they repel each other because they are all negatively charged, and they are at roughly the same average distance from the nucleus.

Thus, the difference between

  1. the charge of the nucleus
  2. and the charge of the core electrons

is the primary contributor to the effective nuclear charge that the outer electrons experience.

Data Science Seminar by David Campbell on Approximate Bayesian Computation and the Earthworm Invasion in Canada

My colleague, David Campbell, will be the feature speaker at the next Vancouver Data Science Meetup on Thursday, June 25.  (This is a jointly organized event with the Vancouver Machine Learning Meetup and the Vancouver R Users Meetup.)  He will present his research on approximate Bayesian computation and Markov Chain Monte Carlo, and he will highlight how he has used these tools to study the invasion of European earthworms in Canada, especially their drastic effects on the boreal forests in Alberta.

Dave is a statistics professor at Simon Fraser University, and I have found him to be very smart and articulate in my communication with him.  This seminar promises to be both entertaining and educational.  If you will attend it, then I look forward to seeing you there!  Check out Dave on Twitter and LInkedIn.

Title: The great Canadian worm invasion (from an approximate Bayesian computation perspective)

Speaker: David Campbell

Date: Thursday, June 25

Place:

HootSuite (Headquarters)

5 East 8th Avenue

Vancouver, BC

Schedule:

• 6:00 pm: Doors are open – feel free to mingle!
• 6:30 pm: Presentation begins.
• ~7:45 Off to a nearby restaurant for food, drinks, and breakout discussions.

Abstract:

After being brought in by pioneers for agricultural reasons, European earthworms have been taking North America by storm and are starting to change the Alberta Boreal forests. This talk uses an invasive species model to introduce the basic ideas behind estimating the rate of new worm introductions and how quickly they spread with the goal of predicting the future extent of the great Canadian worm invasion. To take on the earthworm invaders, we turn to Approximate Bayesian Computation methods. Bayesian statistics are used to gather and update knowledge as new information becomes available owing to their success in prediction and estimating ongoing and evolving processes. Approximate Bayesian Computation is a step in the right direction when it’s just not possible to actually do the right thing- in this case using the exact invasive species model is infeasible. These tools will be used within a Markov Chain Monte Carlo framework.

About Dave Campbell:

Dave Campbell is an Associate Professor in the Department of Statistics and Actuarial Science at Simon Fraser University and Director of the Management and Systems Science Program. Dave’s main research area is at the intersections of statistics with computer science, applied math, and numerical analysis. Dave has published papers on Bayesian algorithms, adaptive time-frequency estimation, and dealing with lack of identifiability. His students have gone on to faculty positions and worked in industry at video game companies and predicting behaviour in malls, chat rooms, and online sales.

Mathematical Statistics Lesson of the Day – Ancillary Statistics

The set-up for today’s post mirrors my earlier Statistics Lessons of the Day on sufficient statistics and complete statistics.

Suppose that you collected data

\mathbf{X} = X_1, X_2, ..., X_n

in order to estimate a parameter \theta.  Let f_\theta(x) be the probability density function (PDF) or probability mass function (PMF) for X_1, X_2, ..., X_n.

Let

a = A(\mathbf{X})

be a statistics based on \textbf{X}.

If the distribution of A(\textbf{X}) does NOT depend on \theta, then A(\textbf{X}) is called an ancillary statistic.

An ancillary statistic contains no information about \theta; its distribution is fixed and known without any relation to \theta.  Why, then, would we care about A(\textbf{X})  I will address this question in later Statistics Lessons of the Day, and I will connect ancillary statistics to sufficient statistics, minimally sufficient statistics and complete statistics.

Analytical Chemistry Lesson of the Day – Method Validation in Quality Assurance

When developing any method in analytical chemistry, it must meet several criteria to ensure that it accomplishes its intended objective at or above an acceptable standard.  This process is called method validation, and it has the following criteria* in the pharmaceutical industry:

  • specificity
  • linearity
  • accuracy
  • precision
  • range
  • limit of detection
  • limit of quantitation
  • robustness**

As I will note in future Chemistry Lessons of the Day, these words are used differently between statistics and chemistry.

*These criteria are taken from Page 723 of the 6th edition of “Quantitative Chemical Analysis” by Daniel C. Harris (2003).

**The Food and Drug Administration does not list robustness as a typical characteristic of method validation.  (See Section B on Page 7 of its “Guidance for Industry Analytical Procedures and Methods Validation for Drugs and Biologics“.)  However, it does mention robustness several times as an important characteristic that “should be evaluated” during the “early stages of method development”.  

Mathematics and Applied Statistics Lesson of the Day – Contrasts

A contrast is a linear combination of a set of variables such that the sum of the coefficients is equal to zero.  Notationally, consider a set of variables

\mu_1, \mu_2, ..., \mu_n.

Then the linear combination

c_1 \mu_1 + c_2 \mu_2 + ... + c_n \mu_n

is a contrast if

c_1 + c_2 + ... + c_n = 0.

There is a reason for why I chose to use \mu as the symbol for the variables in the above notation – in statistics, contrasts provide a very useful framework for comparing multiple population means in hypothesis testing.  In a later Statistics Lesson of the Day, I will illustrate some examples of contrasts, especially in the context of experimental design.

Leaving My Dream Career – Reflecting on My Decision 10 Years Later

I just couldn’t pretend any longer.

It was near the end of my second year at Simon Fraser University.  My GPA was pretty high, and I had just won a competitive NSERC Undergraduate Student Research Award to work with an accomplished cardiac physiologist.  I attended all of the relevant seminars to get the “inside scoop” on how to successfully apply to medical school, and I volunteered in numerous organizations to demonstrate my non-academic credentials.  I had already developed good relationships with several professors who would have gladly written strong recommendations for my application.  All of the stars were aligning for my path to medical school.

I was also miserable, angry and devoid of any further motivation to stay on that path.

crossroads

Image courtesy of Carsten Tolkmit from Flickr.  Obtained via the Creative Commons License.

Read more of this post

Organic Chemistry Lesson of the Day – The 4 Conformational Isomers of Butane

In a previous Chemistry Lesson of the Day, I introduced the simplest case of conformational isomerism – the staggered and eclipsed conformations of ethane.  The next most complicated case of conformational isomerism belongs to butane.  Here are the Newman’s projections of the 4 possibilities.

butane conformers

Modified image courtesy of Avitek from Wikimedia.

The conformational isomers are named with respect to the proximity of the 2 methyl groups.  The dihedral angle between the 2 methyl groups, θ, is below each Newman projection.  From left to right, the conformational isomers are:

  • fully eclipsed (θ = 0 degrees)
  • gauche (θ = 60 degrees)
  • eclipsed (θ = 120 degrees)
  • anti (θ = 180 degrees)

Clearly, the fully eclipsed conformation has the most steric strain* between the 2 methyl groups, so its internal energy is highest.

Clearly, the anti conformation has the lowest steric strain between the 2 methyl groups, so its internal energy is lowest.

The gauche conformation has less steric strain than the eclipsed conformation, so its internal energy is the lower of the two conformations.

From lowest to highest internal energy, here is the ranking of the conformation isomers:

  1. anti
  2. gauche
  3. eclipsed
  4. fully eclipsed

This can be visualized by the following energy diagram.

butane energy diagram

Image courtesy of Mr.Holmium from Wikimedia.

*As mentioned in my previous Chemistry Lesson of the Day on the 2 conformational isomers of ethane, there is some controversy about what really causes the internal energy to increase in eclipsed conformations.  Some chemists suggest that hyperconjugation is responsible.

How to Extract a String Between 2 Characters in R and SAS

Introduction

I recently needed to work with date values that look like this:

mydate
Jan 23/2
Aug 5/20
Dec 17/2

I wanted to extract the day, and the obvious strategy is to extract the text between the space and the slash.  I needed to think about how to program this carefully in both R and SAS, because

  1. the length of the day could be 1 or 2 characters long
  2. I needed a code that adapted to this varying length from observation to observation
  3. there is no function in either language that is suited exactly for this purpose.

In this tutorial, I will show you how to do this in both R and SAS.  I will write a function in R and a macro program in SAS to do so, and you can use the function and the macro program as you please!

Read more of this post

Eric’s Enlightenment for Friday, June 5, 2015

  1. Christian Robert provides a gentle introduction to the Metropolis-Hastings algorithm with accompanying R codes.  (Hat Tip: David Campbell)
  2. John Sall demonstrates how to perform discriminant analysis in JMP, especially for data sets with many variables.
  3. Using machine learning instead of human judgment may improve the selection of job candidates.  This article also includes an excerpt from a New York Times article about how the Milwaukee Bucks used facial recognition as one justification to choose Jabari Parker over Dante Exum.  (Hat Tip: Tyler Cowen)
  4. “A hospital at the University of California San Francisco Medical Center has a robot filling prescriptions.”

Eric’s Enlightenment for Thursday, June 4, 2015

  1. IBM explains how Watson the computer answered the Final Jeopardy question against Ken Jennings and Brad Rutter.  (In a question about American airports, Watson’s answer was “What is Toronto???”  It’s not as ridiculous as you think, and Watson didn’t wager a lot of money for this answer – so it still won by a wide margin.)
  2. Two views on how to reform FIFA by Nate Silver and  – this is an interesting opportunity to apply good principles of institutional design and political economy.
  3. How blind people navigate the Internet.
  4. The Replication Network – a web site devoted to the study of replications in economics.
  5. Cryptochromes and particularly the molecule flavin adenine dinucleotide (FAD) that forms part of the cryptochrome, are thought to be responsible for magnetoreception, the ability of some animals to navigate in Earth’s magnetic field.  Joshua Beardmore et al. have developed a microscope that can detect the magnetic properties of FAD – some very cool work on radical pair chemistry!

Eric’s Enlightenment for Wednesday, June 3, 2015

  1. Jodi Beggs uses the Rule of 70 to explain why small differences in GDP growth rates have large ramifications.
  2. Rick Wicklin illustrates the importance of choosing bin widths carefully when plotting histograms.
  3. Shana Kelley et al. have developed an electrochemical sensor for detecting selected mutated nucleic acids (i.e. cancer markers in DNA!).  “The sensor comprises gold electrical leads deposited on a silicon wafer, with palladium nano-electrodes.”
  4. Rhett Allain provides a very detailed and analytical critique of Mjölnir (Thor’s hammer) – specifically, its unrealistic centre of mass.  This is an impressive exercise in physics!
  5. Congratulations to the Career Services Centre at Simon Fraser University for winning TalentEgg’s Special Award for Innovation by a Career Centre!  I was fortunate to volunteer there as a career advisor for 5 years, and it was a wonderful place to learn, grow and give back to the community. My career has benefited greatly from that experience, and it is a pleasure to continue my involvement as a guest blogger for its official blog, The Career Services Informer. Way to go, everyone!

Eric’s Enlightenment for Tuesday, June 2, 2015

  1. How Lucas Duplan raised $30 million for his start-up, Clinkle, and lost almost its entire executive team (including Chi-Chao Chang) and most of its staff – a very detailed account.
  2. Peter Brown writes a nice chronicle of Fermat’s Last Theorem and how Andrew Wiles’ proof for it almost collapsed (but ultimately prevailed).
  3. Following her recent blog post on the changing dynamics between economists and the media in Canada, Frances Woolley provides 4 suggestions for journalists to improve their coverage of economics in the media.  As always when you read Worthwhile Canadian Initiative, read the comments – this is the most respectful and productive comments community in the econoblogosphere that I have encountered.
  4. Some very important and practical applications of hydrogels: contact lenses, insulin delivery for diabetics, and reconstructive tissue.
  5. The Big Bang Theory (the TV show) has started a scholarship endowment fund for STEM students at UCLA!

Eric’s Enlightenment for Monday, June 1, 2015

  1. A comprehensive graphic of public perceptions about chemistry in the United Kingdom – compiled by the Royal Society of Chemistry.  (Hat Tip: Neil Smithers)
  2. Qing Ke et al. compiled a list of “sleeping beauties” in science – articles that were not appreciated at the time of publication and required much passage in time before becoming popular in the scientific community.  (Unfortunately, that original article is gated by subscription.)  As reported in Nature.com, “the longest sleeper in the top 15 is a statistics paper from Karl Pearson, entitled, ‘On lines and planes of closest fit to systems of points in space‘.  Published in Philosophical Magazine in 1901, this paper awoke only in 2002.”  Out of those top 15 sleeping beauties, 7 were in chemistry.  A full pre-published version of Ke et al.’s paper can be found on arXiv.
  3. What would the Earth’s stratospheric ozone layer look like if the Montreal Protocol was never enacted to ban halocarbon refrigerants, solvents, and aerosol-can propellants?  Using simulations, Martyn Chipperfield et al. “found that the Antarctic ozone hole would have grown by an additional 40% by 2013.”
  4. Jan Hoffman on new challenges in mental health for university students: “Anxiety has now surpassed depression as the most common mental health diagnosis among college students, though depression, too, is on the rise. More than half of students visiting campus clinics cite anxiety as a health concern, according to a recent study of more than 100,000 students nationwide by the Center for Collegiate Mental Health at Penn State.”

Eric’s Enlightenment for Friday, May 29, 2015

  1. P2N3: An aromatic ion made of just phosphorous and nitrogen.  (Yes, aromaticity can be entirely inorganic!)
  2. Using 3-D printing and plastics to make prosthetics.
  3. David Beckwroth and Scott Sumner talk at length about reforming monetary policy with NGDP targeting in this video interview/seminar.
  4. Anky Lai gives a nice introduction to PROC TABULATE (PDF document) – an alternative to PROC FREQ and PROC MEANS in SAS.  Check out her awesome code samples for generating nicely formatted tables and exporting them conveniently into spreadsheets in Excel!

Eric’s Enlightenment for Thursday, May 28, 2015

  1. How long-distance romantic relationships differ from proximate romantic relationships – Mona Chalabi answers a reader’s question.  Be sure to read toward the end about what happens to long-distance romantic relationships after geographical unification.
  2. Joel Shurkin reports on new research that elucidated the traffic engineering ingenuity of ants.  In particular, speed increases with more ants travelling on the same path.  Here is the original paper by Hönicke et al.
  3. It turns out that John Nash had a mostly unknown intellectual breakthrough that has only become public since 2012.  He “proposed a form of possible encryption used decades later by the NSA based on computational complexity theory”.
  4. John Bohannon published a flawed (but real) study in a fake journal to claim that eating chocolate can help you to lose weight.  With some help in spreading the word about this study, many journalists were fooled into running brash headlines about this exciting but badly obtained finding.

Soon we were in the Daily Star, the Irish Examiner, Cosmopolitan’s German website, the Times of India, both the German and Indian site of the Huffington Post, and even television news in Texas and an Australian morning talk show.

Eric’s Enlightenment for Wednesday, May 27, 2015

  1. Why do humans get schizophrenia, but other animals don’t?
  2. At Marginal Revolution, Ramez Naam recently argued that CRISPR (with all of the limitations in some recent research) should not be feared in two blog posts – Part 1 and Part 2.
  3. Ecological fallacies and exception fallacies – two common mistakes in reasoning, statistics and scientific research.
  4. Intrauterine devices (IUDs) are the most effective contraceptives, so why is their usage so low?  Shefali Luthra reports that – at least for teenage girls – pediatricians were not trained to insert them in their education.  Maddie Oatman finds more complicated reasons for women in general.

Eric’s Enlightenment for Tuesday, May 26, 2015

  1. Frances Woolley on the changing dynamics in the relationship between economists and the media in Canada over the past 8 years.
  2. The unintended consequences of labour policies that are meant to be friendly for parents and families – a nice account of many examples by Claire Cain Miller.
  3. FanGraphs explains batting average on balls in play (BABIP) in great detail.
  4. How Neil Bartlett discovered compounds that contain noble gases.  (Yes – they can react!)  He began his research at the University of British Columbia in Vancouver (my hometown).  He also discovered a compound in which oxygen is a positively charged ion.  Very cool stuff!