## Applied Statistics Lesson of the Day – Polynomial Regression is Actually Just Linear Regression

Continuing from my previous Statistics Lesson of the Day on what “linear” really means in “linear regression”, I want to highlight a common example involving this nomenclature that can mislead non-statisticians.  Polynomial regression is a commonly used multiple regression technique; it models the systematic component of the regression model as a $p\text{th}$-order polynomial relationship between the response variable $Y$ and the explanatory variable $x$.

$Y = \beta_0 + \beta_1 x + \beta_2 x^2 + ... + \beta_p x^p + \varepsilon$

However, this model is still a linear regression model, because the response variable is still a linear combination of the regression coefficients.  The regression coefficients would still be estimated using linear algebra through the method of least squares.

Remember: the “linear” in linear regression refers to the linearity between the response variable and the regression coefficients, NOT between the response variable and the explanatory variable(s).

## Machine Learning Lesson of the Day – Estimating Coefficients in Linear Gaussian Basis Function Models

Recently, I introduced linear Gaussian basis function models as a suitable modelling technique for supervised learning problems that involve non-linear relationships between the target and the predictors.  Recall that linear basis function models are generalizations of linear regression that regress the target on functions of the predictors, rather than the predictors themselves.  In linear regression, the coefficients are estimated by the method of least squares.  Thus, it is natural that the estimation of the coefficients in linear Gaussian basis function models is an extension of the method of least squares.

The linear Gaussian basis function model is

$Y = \Phi \beta + \varepsilon$,

where $\Phi_{ij} = \phi_j (x_i)$.  In other words, $\Phi$ is the design matrix, and the element in row $i$ and column $j$ of this design matrix is the $i\text{th}$ predictor being evaluated in the $j\text{th}$ basis function.  (In this case, there is 1 predictor per datum.)

Applying the method of least squares, the coefficient vector, $\beta$, can be estimated by

$\hat{\beta} = (\Phi^{T} \Phi)^{-1} \Phi^{T} Y$.

Note that this looks like the least-squares estimator for the coefficient vector in linear regression, except that the design matrix is not $X$, but $\Phi$.

If you are not familiar with how $\hat{\beta}$ was obtained, I encourage you to review least-squares estimation and the derivation of the estimator of the coefficient vector in linear regression.