## Applied Statistics Lesson of the Day – Additive Models vs. Interaction Models in 2-Factor Experimental Designs

In a recent “Machine Learning Lesson of the Day“, I discussed the difference between a supervised learning model in machine learning and a regression model in statistics.  In that lesson, I mentioned that a statistical regression model usually consists of a systematic component and a random component.  Today’s lesson strictly concerns the systematic component.

An additive model is a statistical regression model in which the systematic component is the arithmetic sum of the individual effects of the predictors.  Consider the simple case of an experiment with 2 factors.  If $Y$ is the response and $X_1$ and $X_2$ are the 2 predictors, then an additive linear model for the relationship between the response and the predictors is

$Y = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \varepsilon$

In other words, the effect of $X_1$ on $Y$ does not depend on the value of $X_2$, and the effect of $X_2$ on $Y$ does not depend on the value of $X_1$.

In contrast, an interaction model is a statistical regression model in which the systematic component is not the arithmetic sum of the individual effects of the predictors.  In other words, the effect of $X_1$ on $Y$ depends on the value of $X_2$, or the effect of $X_2$ on $Y$ depends on the value of $X_1$.  Thus, such a regression model would have 3 effects on the response:

1. $X_1$
2. $X_2$
3. the interaction effect of $X_1$ and $X_2$

full factorial design with 2 factors uses the 2-factor ANOVA model, which is an example of an interaction model.  It assumes a linear relationship between the response and the above 3 effects.

$Y = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \beta_3 X_1 X_2 + \varepsilon$

Note that additive models and interaction models are not confined to experimental design; I have merely used experimental design to provide examples for these 2 types of models.

## Applied Statistics Lesson of the Day – The Full Factorial Design

An experimenter may seek to determine the causal relationships between $G$ factors and the response, where $G > 1$.  On first instinct, you may be tempted to conduct $G$ separate experiments, each using the completely randomized design with 1 factor.  Often, however, it is possible to conduct 1 experiment with $G$ factors at the same time.  This is better than the first approach because

• it is faster
• it uses less resources to answer the same questions
• the interactions between the $G$ factors can be examined

Such an experiment requires the full factorial design; in this design, the treatments are all possible combinations of all levels of all factors.  After controlling for confounding variables and choosing the appropriate range and number of levels of the factor, the different treatments are applied to the different groups, and data on the resulting responses are collected.

The simplest full factorial experiment consists of 2 factors, each with 2 levels.  Such an experiment would result in $2 \times 2 = 4$ treatments, each being a combination of 1 level from the first factor and 1 level from the second factor.  Since this is a full factorial design, experimental units are independently assigned to all treatments.  The 2-factor ANOVA model is commonly used to analyze data from such designs.

In later lessons, I will discuss interactions and 2-factor ANOVA in more detail.