Mathematical and Applied Statistics Lesson of the Day – Don’t Use the Terms “Independent Variable” and “Dependent Variable” in Regression
June 20, 2014 Leave a comment
In math and science, we learn the equation of a line as
with being called the dependent variable and being called the independent variable. This terminology holds true for more complicated functions with multiple variables, such as in polynomial regression.
I highly discourage the use of “independent” and “dependent” in the context of statistics and regression, because these terms have other meanings in statistics. In probability, 2 random variables and are independent if their joint distribution is simply a product of their marginal distributions, and they are dependent if otherwise. Thus, the usage of “independent variable” for a regression model with 2 predictors becomes problematic if the model assumes that the predictors are random variables; a random effects model is an example with such an assumption. An obvious question for such models is whether or not the independent variables are independent, which is a rather confusing question with 2 uses of the word “independent”. A better way to phrase that question is whether or not the predictors are independent.
Thus, in a statistical regression model, I strongly encourage the use of the terms “response variable” or “target variable” (or just “response” and “target”) for and the terms “explanatory variables”, “predictor variables”, “predictors”, “covariates”, or “factors” for .
(I have encountered some statisticians who prefer to reserve “covariate” for continuous predictors and “factor” for categorical predictors.)