## Mathematical and Applied Statistics Lesson of the Day – Don’t Use the Terms “Independent Variable” and “Dependent Variable” in Regression

June 20, 2014 Leave a comment

In math and science, we learn the equation of a line as

,

with being called the **dependent variable** and being called the **independent variable**. This terminology holds true for more complicated functions with multiple variables, such as in polynomial regression.

**I highly discourage the use of “independent” and “dependent” in the context of statistics and regression, because these terms have other meanings in statistics.** In probability, 2 random variables and are **independent** if their **joint distribution** is simply a product of their **marginal distributions**, and they are **dependent** if otherwise. Thus, the usage of “independent variable” for a regression model with 2 predictors becomes problematic if the model assumes that the predictors are random variables; a **random effects model** is an example with such an assumption. An obvious question for such models is whether or not the independent variables are independent, which is a rather confusing question with 2 uses of the word “independent”. A better way to phrase that question is whether or not the predictors are independent.

Thus, in a statistical regression model, I strongly encourage the use of the terms “response variable” or “target variable” (or just “response” and “target”) for and the terms “explanatory variables”, “predictor variables”, “predictors”, “covariates”, or “factors” for .

(I have encountered some statisticians who prefer to reserve “covariate” for continuous predictors and “factor” for categorical predictors.)

## Recent Comments