Applied Statistics Lesson of the Day – Blocking and the Randomized Complete Blocked Design (RCBD)

A completely randomized design works well for a homogeneous population – one that does not have major differences between any sub-populations.  However, what if a population is heterogeneous?

Consider an example that commonly occurs in medical studies.  An experiment seeks to determine the effectiveness of a drug on curing a disease, and 100 patients are recruited for this double-blinded study – 50 are men, and 50 are women.  An abundance of biological knowledge tells us that men and women have significantly physiologies, and this is a heterogeneous population with respect to gender.  If a completely randomized design is used for this study, gender could be a confounding variable; this is especially true if the experimental group has a much higher proportion of one gender, and the control group has a much higher proportion of the other gender.  (For instance, purely due to the randomness, 45 males may be assigned to the experimental group, and 45 females may be assigned to the control group.)  If a statistically significant difference in the patients’ survival from the disease is observed between such a pair of experimental and control groups, this effect could be attributed to the drug or to gender, and that would ruin the goal of determining the cause-and-effect relationship between the drug and survival from the disease.

To overcome this heterogeneity and control for the effect of gender, a randomized blocked design could be used.  Blocking is the division of the experimental units into homogeneous sub-populations before assigning treatments to them.  A randomized blocked design for our above example would divide the males and females into 2 separate sub-populations, and then each of these 2 groups is split into the experimental and control group.  Thus, the experiment actually has 4 groups:

  1. 25 men take the drug (experimental)
  2. 25 men take a placebo (control)
  3. 25 women take the drug (experimental)
  4. 25 women take a placebo (control)

Essentially, the population is divided into blocks of homogeneous sub-populations, and a completely randomized design is applied to each block.  This minimizes the effect of gender on the response and increases the precision of the estimate of the effect of the drug.

Advertisements

Statistics and Chemistry Lesson of the Day – Illustrating Basic Concepts in Experimental Design with the Synthesis of Ammonia

To summarize what we have learned about experimental design in the past few Applied Statistics Lessons of the Day, let’s use an example from physical chemistry to illustrate these basic principles.

Ammonia (NH3) is widely used as a fertilizer in industry.  It is commonly synthesized by the Haber process, which involves a reaction between hydrogen gas and nitrogen gas.

N2 + 3 H2 → 2 NH3   (ΔH = −92.4 kJ·mol−1)

Recall that ΔH is the change in enthalpy.  Under constant pressure (which is the case for most chemical reactions), ΔH is the heat absorbed or released by the system.

Read more of this post

Applied Statistics Lesson of the Day: Sample Size and Replication in Experimental Design

The goal of an experiment is to determine

  1. whether or not there is a cause-and-effect relationship between the factor and the response
  2. the strength of the causal relationship, should such a relationship exist.

To answer these questions, the response variable is measured in both the control group and the experimental group.  If there is a difference between the 2 responses, then there is evidence to suggest that the causal relationship exists, and the difference can be measured and quantified.

However, in most* experiments, there is random variation in the response.  Random variation exists in the natural sciences, and there is even more of it in the social sciences.  Thus, an observed difference between the control and experimental groups could be mistakenly attributed to a cause-and-effect relationship when the source of the difference is really just random variation.  In short, the difference may simply be due to the noise rather than the signal.  

To detect an actual difference beyond random variation (i.e. to obtain a higher signal-to-noise ratio), it is important to use replication to obtain a sufficiently large sample size in the experiment.  Replication is the repeated application of the treatments to multiple independently assigned experimental units.  (Recall that randomization is an important part of controlling for confounding variables in an experiment.  Randomization ensures that the experimental units are independently assigned to the different treatments.)  The number of independently assigned experimental units that receive the same treatment is the sample size.

*Deterministic computer experiments are unlike most experiments; they do not have random variation in the responses.

Applied Statistics Lesson of the Day – Basic Terminology in Experimental Design #2: Controlling for Confounders

A well designed experiment must have good control, which is the reduction of effects from confounding variables.  There are several ways to do so:

  • Include a control group.  This group will receive a neutral treatment or a standard treatment.  (This treatment may simply be nothing.)  The experimental group will receive the new treatment or treatment of interest.  The response in the experimental group will be compared to the response in the control group to assess the effect of the new treatment or treatment of interest.  Any effect from confounding variables will affect both the control group and the experimental group equally, so the only difference between the 2 groups should be due to the new treatment or treatment of interest.
  • In medical studies with patients as the experimental units, it is common to include a placebo group.  Patients in the placebo group get a treatment that is known to have no effect.  This accounts for the placebo effect.
    • For example, in a drug study, a patient in the placebo group may get a sugar pill.
  • In experiments with human or animal subjects, participants and/or the experimenters are often blinded.  This means that they do not know which treatment the participant received.  This ensures that knowledge of receiving a particular treatment – for either the participant or the experimenters – is not a confounding variable.  An experiment that blinds both the participants and the experimenters is called a double-blinded experiment.
  • For confounding variables that are difficult or impossible to control for, the experimental units should be assigned to the control group and the experimental group by randomization.  This can be done with random number tables, flipping a coin, or random number generators from computers.  This ensures that confounding effects affect both the control group and the experimental group roughly equally.
    • For example, an experimenter wants to determine if the HPV vaccine will make new students immune to HPV.  There will be 2 groups: the control group will not receive the vaccine, and the experimental group will receive the vaccine.  If the experimenter can choose students from 2 schools for her study, then the students should be randomly assigned into the 2 groups, so that each group will have roughly the same number of students from each school.  This would minimize the confounding effect of the schools.