Applied Statistics Lesson of the Day: Sample Size and Replication in Experimental Design

The goal of an experiment is to determine

  1. whether or not there is a cause-and-effect relationship between the factor and the response
  2. the strength of the causal relationship, should such a relationship exist.

To answer these questions, the response variable is measured in both the control group and the experimental group.  If there is a difference between the 2 responses, then there is evidence to suggest that the causal relationship exists, and the difference can be measured and quantified.

However, in most* experiments, there is random variation in the response.  Random variation exists in the natural sciences, and there is even more of it in the social sciences.  Thus, an observed difference between the control and experimental groups could be mistakenly attributed to a cause-and-effect relationship when the source of the difference is really just random variation.  In short, the difference may simply be due to the noise rather than the signal.  

To detect an actual difference beyond random variation (i.e. to obtain a higher signal-to-noise ratio), it is important to use replication to obtain a sufficiently large sample size in the experiment.  Replication is the repeated application of the treatments to multiple independently assigned experimental units.  (Recall that randomization is an important part of controlling for confounding variables in an experiment.  Randomization ensures that the experimental units are independently assigned to the different treatments.)  The number of independently assigned experimental units that receive the same treatment is the sample size.

*Deterministic computer experiments are unlike most experiments; they do not have random variation in the responses.