## Applied Statistics Lesson of the Day – The Matched-Pair (or Paired) t-Test

My last lesson introduced the matched pairs experimental design, which is a special type of the randomized blocked design.  Let’s now talk about how to analyze the data from such a design.

Since the experimental units are organized in pairs, the units between pairs (blocks) are not independently assigned.  (The units within each pair are independently assigned – returning to the glove example, one hand is randomly chosen to wear the nitrile glove, while the other is randomly chosen to wear the latex glove.)  Because of this lack of independence between pairs, the independent 2-sample t-test is not applicable.  Instead, use the matched pair t-test (also called the paired or the paired difference t-test).  This is really a 1-sample t-test that tests the difference between the responses of the experimental and the control groups.

## Statistics and Chemistry Lesson of the Day – Illustrating Basic Concepts in Experimental Design with the Synthesis of Ammonia

To summarize what we have learned about experimental design in the past few Applied Statistics Lessons of the Day, let’s use an example from physical chemistry to illustrate these basic principles.

Ammonia (NH3) is widely used as a fertilizer in industry.  It is commonly synthesized by the Haber process, which involves a reaction between hydrogen gas and nitrogen gas.

N2 + 3 H2 → 2 NH3   (ΔH = −92.4 kJ·mol−1)

Recall that ΔH is the change in enthalpy.  Under constant pressure (which is the case for most chemical reactions), ΔH is the heat absorbed or released by the system.

## Applied Statistics Lesson of the Day – Positive Control in Experimental Design

In my recent lesson on controlling for confounders in experimental design, the control group was described as one that received a neutral or standard treatment, and the standard treatment may simply be nothing.  This is a negative control group.  Not all experiments require a negative control group; some experiments instead have positive control group.

A positive control group is a group of experimental units that receive a treatment that is known to cause an effect on the response.  Such a causal relationship would have been previously established, and its inclusion in the experiment allows a new treatment to be compared to this existing treatment.  Again, both the positive control group and the experimental group experience the same experimental procedures and conditions except for the treatment.  The existing treatment with the known effect on the response is applied to the positive control group, and the new treatment with the unknown effect on the response is applied to the experimental group.  If the new treatment has a causal relationship with the response, both the positive control group and the experimental group should have the same responses.  (This assumes, of course, that the response can only be changed in 1 direction.  If the response can increase or decrease in value (or, more generally, change in more than 1 way), then it is possible for the positive control group and the experimental group to have the different responses.

In short, in an experiment with a positive control group, an existing treatment is known to “work”, and the new treatment is being tested to see if it can “work” just as well or even better.  Experiments to test for the effectiveness of a new medical therapies or a disease detector often have positive controls; there are existing therapies or detectors that work well, and the new therapy or detector is being evaluated for its effectiveness.

Experiments with positive controls are useful for ensuring that the experimental procedures and conditions proceed as planned.  If the positive control does not show the expected response, then something is wrong with the experimental procedures or conditions, and any “good” result from the new treatment should be considered with skepticism.

## Applied Statistics Lesson of the Day: Sample Size and Replication in Experimental Design

The goal of an experiment is to determine

1. whether or not there is a cause-and-effect relationship between the factor and the response
2. the strength of the causal relationship, should such a relationship exist.

To answer these questions, the response variable is measured in both the control group and the experimental group.  If there is a difference between the 2 responses, then there is evidence to suggest that the causal relationship exists, and the difference can be measured and quantified.

However, in most* experiments, there is random variation in the response.  Random variation exists in the natural sciences, and there is even more of it in the social sciences.  Thus, an observed difference between the control and experimental groups could be mistakenly attributed to a cause-and-effect relationship when the source of the difference is really just random variation.  In short, the difference may simply be due to the noise rather than the signal.

To detect an actual difference beyond random variation (i.e. to obtain a higher signal-to-noise ratio), it is important to use replication to obtain a sufficiently large sample size in the experiment.  Replication is the repeated application of the treatments to multiple independently assigned experimental units.  (Recall that randomization is an important part of controlling for confounding variables in an experiment.  Randomization ensures that the experimental units are independently assigned to the different treatments.)  The number of independently assigned experimental units that receive the same treatment is the sample size.

*Deterministic computer experiments are unlike most experiments; they do not have random variation in the responses.

## Applied Statistics Lesson of the Day – Basic Terminology in Experimental Design #2: Controlling for Confounders

A well designed experiment must have good control, which is the reduction of effects from confounding variables.  There are several ways to do so:

• Include a control group.  This group will receive a neutral treatment or a standard treatment.  (This treatment may simply be nothing.)  The experimental group will receive the new treatment or treatment of interest.  The response in the experimental group will be compared to the response in the control group to assess the effect of the new treatment or treatment of interest.  Any effect from confounding variables will affect both the control group and the experimental group equally, so the only difference between the 2 groups should be due to the new treatment or treatment of interest.
• In medical studies with patients as the experimental units, it is common to include a placebo group.  Patients in the placebo group get a treatment that is known to have no effect.  This accounts for the placebo effect.
• For example, in a drug study, a patient in the placebo group may get a sugar pill.
• In experiments with human or animal subjects, participants and/or the experimenters are often blinded.  This means that they do not know which treatment the participant received.  This ensures that knowledge of receiving a particular treatment – for either the participant or the experimenters – is not a confounding variable.  An experiment that blinds both the participants and the experimenters is called a double-blinded experiment.
• For confounding variables that are difficult or impossible to control for, the experimental units should be assigned to the control group and the experimental group by randomization.  This can be done with random number tables, flipping a coin, or random number generators from computers.  This ensures that confounding effects affect both the control group and the experimental group roughly equally.
• For example, an experimenter wants to determine if the HPV vaccine will make new students immune to HPV.  There will be 2 groups: the control group will not receive the vaccine, and the experimental group will receive the vaccine.  If the experimenter can choose students from 2 schools for her study, then the students should be randomly assigned into the 2 groups, so that each group will have roughly the same number of students from each school.  This would minimize the confounding effect of the schools.