K-nearest neighbours | The Chemical Statistician

Machine Learning Lesson of the Day – Memory-Based Learning

March 5, 2014 Leave a comment

Memory-based learning (also called instance-based learning) is a type of non-parametric algorithm that compares new test data with training data in order to solve the given machine learning problem. Such algorithms search for the training data that are most similar to the test data and make predictions based on these similarities. (From what I have learned, memory-based learning is used for supervised learning only. Can you think of any memory-based algorithms for unsupervised learning?)

A distinguishing feature of memory-based learning is its storage of the entire training set. This is computationally costly, especially if the training set is large – the storage itself is costly, and the complexity of the model grows with a larger data set. However, it is advantageous because it uses less assumptions than parametric models, so it is adaptable to problems for which the assumptions may fail and no clear pattern is known ex ante. (In contrast, parametric models like linear regression make generalizations about the training data; after building a model to predict the targets, the training data are discarded, so there is no need to store them.) Thus, I recommend using memory-based learning algorithms when the data set is relatively small and there is no prior knowledge or information about the underlying patterns in the data.

Two classic examples of memory-based learning are K-nearest neighbours classification and K-nearest neighbours regression.

Filed under Machine Learning, Machine Learning Lesson of the Day, Predictive Modelling, Statistics Tagged with instance-based learning, K-nearest neighbour, K-nearest neighbours, k-nearest neighbours classification, K-nearest neighbours regression, linear regression, machine learning, memory-based learning, non-parametric, parametric, supervised learning, test data, training data

Machine Learning Lesson of the Day: The K-Nearest Neighbours Classifier

February 21, 2014 Leave a comment

The K-nearest neighbours (KNN) classifier is a non-parametric classification technique that classifies an input $X$ by

identifying the K data (the K “neighbours”) in the training set that are closest to $X$
counting the number of “neighbours” that belong to each class of the target variable
classifying $X$ by the most common class to which its neighbours belong

K is usually an odd number to avoid resolving ties.

The proximity of the neighbours to $X$ is usually defined by Euclidean distance.

Validation or cross-validation can be used to determine the best number of “K”.

Filed under Machine Learning, Machine Learning Lesson of the Day, Predictive Modelling, Statistics Tagged with classification, cross-validation, Euclidean Distance, K-nearest neighbours, KNN, machine learning, non-parametric, supervised learning, training set, validation

	Eric Cai - The Chemi… on Convert multiple variables bet…
	Jack on Convert multiple variables bet…
	Eric Cai - The Chemi… on Getting the names, types, form…
	Emily V on Getting the names, types, form…
	Lauren McClain on Convert multiple variables bet…
	Eric Cai - The Chemi… on Convert multiple variables bet…
	Lauren McClain on Convert multiple variables bet…
	Eric Cai - The Chemi… on Exploratory Data Analysis: Com…
	CK on Exploratory Data Analysis: Com…
	Eric Cai - The Chemi… on Video Tutorial: Breaking Down…

The Chemical Statistician

Machine Learning Lesson of the Day – Memory-Based Learning

Machine Learning Lesson of the Day: The K-Nearest Neighbours Classifier

Eric’s Twitter Feed (@chemstateric)

Recent Comments

Popular Topics

Recent Posts

About Eric

Blogs and Web Sites That I Like to Read

Archives

Categories