# Machine Learning Lesson of the Day – K-Nearest Neighbours Regression

February 28, 2014 5 Comments

I recently introduced the K-nearest neighbours classifier. Some slight adjustments to the same algorithm can make it into a regression technique.

Given a training set and a new input , we can predict the target of the new input by

- identifying the K data (the K “neighbours”) in the training set that are closest to by
**Euclidean distance** - build a linear regression model to predict the target for

- the K data are the predictors
- the reciprocals of the predictors’ distances to are their respective regression coefficients (the “weights”)

Validation or cross-validation can be used to determine the best number of “K”.

Could you please give an example with R code – that would be awesome!

Yes! I’ve actually been drafting a long tutorial on KNN for the past week! Thanks for reading, Holger – please stay tuned!

Isn’t what you described simply local regression (LOESS)? R-already gives this to you through its plotting functions [check out stat_smooth(method=”loess”), under ggplot]. You could extract the predictions from the plotted values.

I cannot give a confident answer to your question – I need to think about this. LOESS does use neighbouring data to fit a local polynomial that forms the fitted response function, but it also uses a smoothing parameter that takes the degree of the polynomial into account – there is no such smoothing parameter in K-nearest-neighbour regression. Furthermore, K-nearest-neighbour regression does not use a polynomial to estimate the response; it simply uses the average of the K nearest neighbouring responses (or a weighted average, if you wish to add this complexity). Thus, I don’t think that they are the same.

Would anybody else be so kind and knowledgeable to shed light on this very good question?

Hi Anon,

After some more research, I can confidently tell you that LOESS is different from KNN regression, and the reasons that I gave in my last comment support this claim. LOESS fits a 2nd-degree polynomial through the data, and KNN regression does not do that. The smoothing parameter in LOESS, “alpha”, is like the “K” parameter in KNN regression – they both control how much of the neighbouring data to use in the fitting process. However,

– in LOESS, the smoothing parameter, “alpha”, can take on decimal values

– in KNN regression, the “K” parameter can take on only positive integer values, and usually only odd positive integers to avoid dealing with ties

I hope that this explanation is clarifying for you. Thanks for your very good question!