Machine Learning Lesson of the Day – K-Nearest Neighbours Regression

I recently introduced the K-nearest neighbours classifier.  Some slight adjustments to the same algorithm can make it into a regression technique.

Given a training set and a new input X, we can predict the target of the new input by

  1. identifying the K data (the K “neighbours”) in the training set that are closest to X by Euclidean distance
  2. build a linear regression model to predict the target for X
  • the K data are the predictors
  • the reciprocals of the predictors’ distances to X are their respective regression coefficients (the “weights”)

Validation or cross-validation can be used to determine the best number of “K”.

5 Responses to Machine Learning Lesson of the Day – K-Nearest Neighbours Regression

  1. Could you please give an example with R code – that would be awesome!

    • Yes! I’ve actually been drafting a long tutorial on KNN for the past week! Thanks for reading, Holger – please stay tuned!

      • Anon says:

        Isn’t what you described simply local regression (LOESS)? R-already gives this to you through its plotting functions [check out stat_smooth(method=”loess”), under ggplot]. You could extract the predictions from the plotted values.

      • I cannot give a confident answer to your question – I need to think about this. LOESS does use neighbouring data to fit a local polynomial that forms the fitted response function, but it also uses a smoothing parameter that takes the degree of the polynomial into account – there is no such smoothing parameter in K-nearest-neighbour regression. Furthermore, K-nearest-neighbour regression does not use a polynomial to estimate the response; it simply uses the average of the K nearest neighbouring responses (or a weighted average, if you wish to add this complexity). Thus, I don’t think that they are the same.

        Would anybody else be so kind and knowledgeable to shed light on this very good question?

      • Hi Anon,

        After some more research, I can confidently tell you that LOESS is different from KNN regression, and the reasons that I gave in my last comment support this claim. LOESS fits a 2nd-degree polynomial through the data, and KNN regression does not do that. The smoothing parameter in LOESS, “alpha”, is like the “K” parameter in KNN regression – they both control how much of the neighbouring data to use in the fitting process. However,

        – in LOESS, the smoothing parameter, “alpha”, can take on decimal values
        – in KNN regression, the “K” parameter can take on only positive integer values, and usually only odd positive integers to avoid dealing with ties

        I hope that this explanation is clarifying for you. Thanks for your very good question!

Your thoughtful comments are much appreciated!

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: