Machine Learning Lesson of the Day – Estimating Coefficients in Linear Gaussian Basis Function Models

Recently, I introduced linear Gaussian basis function models as a suitable modelling technique for supervised learning problems that involve non-linear relationships between the target and the predictors.  Recall that linear basis function models are generalizations of linear regression that regress the target on functions of the predictors, rather than the predictors themselves.  In linear regression, the coefficients are estimated by the method of least squares.  Thus, it is natural that the estimation of the coefficients in linear Gaussian basis function models is an extension of the method of least squares.

The linear Gaussian basis function model is

$Y = \Phi \beta + \varepsilon$,

where $\Phi_{ij} = \phi_j (x_i)$.  In other words, $\Phi$ is the design matrix, and the element in row $i$ and column $j$ of this design matrix is the $i\text{th}$ predictor being evaluated in the $j\text{th}$ basis function.  (In this case, there is 1 predictor per datum.)

Applying the method of least squares, the coefficient vector, $\beta$, can be estimated by

$\hat{\beta} = (\Phi^{T} \Phi)^{-1} \Phi^{T} Y$.

Note that this looks like the least-squares estimator for the coefficient vector in linear regression, except that the design matrix is not $X$, but $\Phi$.

If you are not familiar with how $\hat{\beta}$ was obtained, I encourage you to review least-squares estimation and the derivation of the estimator of the coefficient vector in linear regression.

Machine Learning Lesson of the Day – Introduction to Linear Basis Function Models

Given a supervised learning problem of using $p$ inputs ($x_1, x_2, ..., x_p$) to predict a continuous target $Y$, the simplest model to use would be linear regression.  However, what if we know that the relationship between the inputs and the target is non-linear, but we are unsure of exactly what form this relationship has?

One way to overcome this problem is to use linear basis function models.  These models assume that the target is a linear combination of a set of $p+1$ basis functions.

$Y_i = w_0 + w_1 \phi_1(x_1) + w_2 \phi_2(x_2) + ... + w_p \phi_p(x_p)$

This is a generalization of linear regression that essentially replaces each input with a function of the input.  (A linear basis function model that uses the identity function is just linear regression.)

The type of basis functions (i.e. the type of function given by $\phi$) is chosen to suitably model the non-linearity in the relationship between the inputs and the target.  It also needs to be chosen so that the computation is efficient.  I will discuss variations of linear basis function models in a later Machine Learning Lesson of the Day.