Applied Statistics Lesson of the Day – Notation for Fractional Factorial Designs

Fractional factorial designs use the $L^{F-p}$ notation; unfortunately, this notation is not clearly explained in most textbooks or web sites about experimental design.  I hope that my explanation below is useful.

• $L$ is the number of levels in each factor; note that the $L^{F-p}$ notation assumes that all factors have the same number of levels.
• If a factor has 2 levels, then the levels are usually coded as $+1$ and $-1$.
• If a factor has 3 levels, then the levels are usually coded as $+1$, $0$, and $-1$.
• $F$ is the number of factors in the experiment
• $p$ is the number of times that the full factorial design is fractionated by $L$.  This number is badly explained by most textbooks and web sites that I have seen, because they simply say that $p$ is the fraction – this is confusing, because a fraction has a numerator and a denominator, and $p$ is just 1 number.  To clarify,
• the fraction is $L^{-p}$
• the number of treatments in the fractional factorial design is $L^{-p}$ multiplied by the total possible number of treatments in the full factorial design, which is $L^F$.

If all $L^F$ possible treatments are used in the experiment, then a full factorial design is used.  If a fractional factorial design is used instead, then $L^{-p}$ denotes the fraction of the $L^F$ treatments that is used.

Most factorial experiments use binary factors (i.e. factors with 2 levels, $L = 2$).  Thus,

• if $p = 1$, then the fraction of treatments that is used is $2^{-1} = 1/2$.
• if $p = 2$, then the fraction of treatments that is used is $2^{-2} = 1/4$.

This is why

• a $2^{F-1}$ design is often called a half-fraction design.
• a $2^{F-2}$ design is often called a quarter-fraction design.

However, most sources that I have read do not bother to mention that $L$ can be greater than 2; experiments with 3-level factors are less frequent but still common.  Thus, the terms half-fraction design and half-quarter design only apply to binary factors.  If $L = 3$, then

• a $3^{F-1}$ design uses one-third of all possible treatments.
• a $3^{F-2}$ design uses one-ninth of all possible treatments.

Applied Statistics Lesson of the Day – Additive Models vs. Interaction Models in 2-Factor Experimental Designs

In a recent “Machine Learning Lesson of the Day“, I discussed the difference between a supervised learning model in machine learning and a regression model in statistics.  In that lesson, I mentioned that a statistical regression model usually consists of a systematic component and a random component.  Today’s lesson strictly concerns the systematic component.

An additive model is a statistical regression model in which the systematic component is the arithmetic sum of the individual effects of the predictors.  Consider the simple case of an experiment with 2 factors.  If $Y$ is the response and $X_1$ and $X_2$ are the 2 predictors, then an additive linear model for the relationship between the response and the predictors is

$Y = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \varepsilon$

In other words, the effect of $X_1$ on $Y$ does not depend on the value of $X_2$, and the effect of $X_2$ on $Y$ does not depend on the value of $X_1$.

In contrast, an interaction model is a statistical regression model in which the systematic component is not the arithmetic sum of the individual effects of the predictors.  In other words, the effect of $X_1$ on $Y$ depends on the value of $X_2$, or the effect of $X_2$ on $Y$ depends on the value of $X_1$.  Thus, such a regression model would have 3 effects on the response:

1. $X_1$
2. $X_2$
3. the interaction effect of $X_1$ and $X_2$

full factorial design with 2 factors uses the 2-factor ANOVA model, which is an example of an interaction model.  It assumes a linear relationship between the response and the above 3 effects.

$Y = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \beta_3 X_1 X_2 + \varepsilon$

Note that additive models and interaction models are not confined to experimental design; I have merely used experimental design to provide examples for these 2 types of models.

Applied Statistics Lesson of the Day – The Full Factorial Design

An experimenter may seek to determine the causal relationships between $G$ factors and the response, where $G > 1$.  On first instinct, you may be tempted to conduct $G$ separate experiments, each using the completely randomized design with 1 factor.  Often, however, it is possible to conduct 1 experiment with $G$ factors at the same time.  This is better than the first approach because

• it is faster
• it uses less resources to answer the same questions
• the interactions between the $G$ factors can be examined

Such an experiment requires the full factorial design; in this design, the treatments are all possible combinations of all levels of all factors.  After controlling for confounding variables and choosing the appropriate range and number of levels of the factor, the different treatments are applied to the different groups, and data on the resulting responses are collected.

The simplest full factorial experiment consists of 2 factors, each with 2 levels.  Such an experiment would result in $2 \times 2 = 4$ treatments, each being a combination of 1 level from the first factor and 1 level from the second factor.  Since this is a full factorial design, experimental units are independently assigned to all treatments.  The 2-factor ANOVA model is commonly used to analyze data from such designs.

In later lessons, I will discuss interactions and 2-factor ANOVA in more detail.