## Applied Statistics Lesson of the Day – Notation for Fractional Factorial Designs

Fractional factorial designs use the $L^{F-p}$ notation; unfortunately, this notation is not clearly explained in most textbooks or web sites about experimental design.  I hope that my explanation below is useful.

• $L$ is the number of levels in each factor; note that the $L^{F-p}$ notation assumes that all factors have the same number of levels.
• If a factor has 2 levels, then the levels are usually coded as $+1$ and $-1$.
• If a factor has 3 levels, then the levels are usually coded as $+1$, $0$, and $-1$.
• $F$ is the number of factors in the experiment
• $p$ is the number of times that the full factorial design is fractionated by $L$.  This number is badly explained by most textbooks and web sites that I have seen, because they simply say that $p$ is the fraction – this is confusing, because a fraction has a numerator and a denominator, and $p$ is just 1 number.  To clarify,
• the fraction is $L^{-p}$
• the number of treatments in the fractional factorial design is $L^{-p}$ multiplied by the total possible number of treatments in the full factorial design, which is $L^F$.

If all $L^F$ possible treatments are used in the experiment, then a full factorial design is used.  If a fractional factorial design is used instead, then $L^{-p}$ denotes the fraction of the $L^F$ treatments that is used.

Most factorial experiments use binary factors (i.e. factors with 2 levels, $L = 2$).  Thus,

• if $p = 1$, then the fraction of treatments that is used is $2^{-1} = 1/2$.
• if $p = 2$, then the fraction of treatments that is used is $2^{-2} = 1/4$.

This is why

• a $2^{F-1}$ design is often called a half-fraction design.
• a $2^{F-2}$ design is often called a quarter-fraction design.

However, most sources that I have read do not bother to mention that $L$ can be greater than 2; experiments with 3-level factors are less frequent but still common.  Thus, the terms half-fraction design and half-quarter design only apply to binary factors.  If $L = 3$, then

• a $3^{F-1}$ design uses one-third of all possible treatments.
• a $3^{F-2}$ design uses one-ninth of all possible treatments.

## Applied Statistics Lesson of the Day – The Full Factorial Design

An experimenter may seek to determine the causal relationships between $G$ factors and the response, where $G > 1$.  On first instinct, you may be tempted to conduct $G$ separate experiments, each using the completely randomized design with 1 factor.  Often, however, it is possible to conduct 1 experiment with $G$ factors at the same time.  This is better than the first approach because

• it is faster
• it uses less resources to answer the same questions
• the interactions between the $G$ factors can be examined

Such an experiment requires the full factorial design; in this design, the treatments are all possible combinations of all levels of all factors.  After controlling for confounding variables and choosing the appropriate range and number of levels of the factor, the different treatments are applied to the different groups, and data on the resulting responses are collected.

The simplest full factorial experiment consists of 2 factors, each with 2 levels.  Such an experiment would result in $2 \times 2 = 4$ treatments, each being a combination of 1 level from the first factor and 1 level from the second factor.  Since this is a full factorial design, experimental units are independently assigned to all treatments.  The 2-factor ANOVA model is commonly used to analyze data from such designs.

In later lessons, I will discuss interactions and 2-factor ANOVA in more detail.

## Applied Statistics Lesson of the Day – The Matched Pairs Experimental Design

The matched pairs design is a special type of the randomized blocked design in experimental design.  It has only 2 treatment levels (i.e. there is 1 factor, and this factor is binary), and a blocking variable divides the $n$ experimental units into $n/2$ pairs.  Within each pair (i.e. each block), the experimental units are randomly assigned to the 2 treatment groups (e.g. by a coin flip).  The experimental units are divided into pairs such that homogeneity is maximized within each pair.

For example, a lab safety officer wants to compare the durability of nitrile and latex gloves for chemical experiments.  She wants to conduct an experiment with 30 nitrile gloves and 30 latex gloves to test her hypothesis.  She does her best to draw a random sample of 30 students in her university for her experiment, and they all perform the same organic synthesis using the same procedures to see which type of gloves lasts longer.

She could use a completely randomized design so that a random sample of 30 hands get the 30 nitrile gloves, and the other 30 hands get the 30 latex gloves.  However, since lab habits are unique to each person, this poses a confounding variable – durability can be affected by both the material and a student’s lab habits, and the lab safety officer only wants to study the effect of the material.  Thus, a randomized block design should be used instead so that each student acts as a blocking variable – 1 hand gets a nitrile glove, and 1 hand gets a latex glove.  Once the gloves have been given to the student, the type of glove is randomly assigned to each hand; some may get the nitrile glove on their left hand, and some may get it on their right hand.  Since this design involves one binary factor and blocks that divide the experimental units into pairs, this is a matched pairs design.

## Statistics and Chemistry Lesson of the Day – Illustrating Basic Concepts in Experimental Design with the Synthesis of Ammonia

To summarize what we have learned about experimental design in the past few Applied Statistics Lessons of the Day, let’s use an example from physical chemistry to illustrate these basic principles.

Ammonia (NH3) is widely used as a fertilizer in industry.  It is commonly synthesized by the Haber process, which involves a reaction between hydrogen gas and nitrogen gas.

N2 + 3 H2 → 2 NH3   (ΔH = −92.4 kJ·mol−1)

Recall that ΔH is the change in enthalpy.  Under constant pressure (which is the case for most chemical reactions), ΔH is the heat absorbed or released by the system.

## Applied Statistics Lesson of the Day – The Completely Randomized Design with 1 Factor

The simplest experimental design is the completely randomized design with 1 factor.  In this design, each experimental unit is randomly assigned to a factor level.  This design is most useful for a homogeneous population (one that does not have major differences between any sub-populations).  It is appealing because of its simplicity and flexibility – it can be used for a factor with any number of levels, and different treatments can have different sample sizes.  After controlling for confounding variables and choosing the appropriate range and number of levels of the factor, the different treatments are applied to the different groups, and data on the resulting responses are collected.  The means of the response variable in the different groups are compared; if there are significant differences, then there is evidence to suggest that the factor and the response have a causal relationship.  The single-factor analysis of variance (ANOVA) model is most commonly used to analyze the data in such an experiment, but it does assume that the data in each group have a normal distribution, and that all groups have equal variance.  The Kruskal-Wallis test is a non-parametric alternative to ANOVA in analyzing data from single-factor completely randomized experiments.

If the factor has 2 levels, you may think that an independent 2-sample t-test with equal variance can also be used to analyze the data.  This is true, but the square of the t-test statistic in this case is just the F-test statistic in a single-factor ANOVA with 2 groups.  Thus, the results of these 2 tests are the same.  ANOVA generalizes the independent 2-sample t-test with equal variance to more than 2 groups.

Some textbooks state that “random assignment” means random assignment of experimental units to treatments, whereas other textbooks state that it means random assignment of treatments to experimental units.  I don’t think that there is any difference between these 2 definitions, but I welcome your thoughts in the comments.

## Applied Statistics Lesson of the Day – Basic Terminology in Experimental Design #1

The word “experiment” can mean many different things in various contexts.  In science and statistics, it has a very particular and subtle definition, one that is not immediately familiar to many people who work outside of the field of experimental design. This is the first of a series of blog posts to clarify what an experiment is, how it is conducted, and why it is so central to science and statistics.

Experiment: A procedure to determine the causal relationship between 2 variables – an explanatory variable and a response variable.  The value of the explanatory variable is changed, and the value of the response variable is observed for each value of the explantory variable.

• An experiment can have 2 or more explanatory variables and 2 or more response variables.
• In my experience, I find that most experiments have 1 response variable, but many experiments have 2 or more explanatory variables.  The interactions between the multiple explanatory variables are often of interest.
• All other variables are held constant in this process to avoid confounding.

Explanatory Variable or Factor: The variable whose values are set by the experimenter.  This variable is the cause in the hypothesis.  (*Many people call this the independent variable.  I discourage this usage, because “independent” means something very different in statistics.)

Response Variable: The variable whose values are observed by the experimenter as the explanatory variable’s value is changed.  This variable is the effect in the hypothesis.  (*Many people call this the dependent variable.  Further to my previous point about “independent variables”, dependence means something very different in statistics, and I discourage using this usage.)

Factor Level: Each possible value of the factor (explanatory variable).  A factor must have at least 2 levels.

Treatment: Each possible combination of factor levels.

• If the experiment has only 1 explanatory variable, then each treatment is simply each factor level.
• If the experiment has 2 explanatory variables, X and Y, then each treatment is a combination of 1 factor level from X and 1 factor level from Y.  Such combining of factor levels generalizes to experiments with more than 2 explanatory variables.

Experimental Unit: The object on which a treatment is applied.  This can be anything – person, group of people, animal, plant, chemical, guitar, baseball, etc.