Machine Learning Lesson of the Day – Memory-Based Learning

Memory-based learning (also called instance-based learning) is a type of non-parametric algorithm that compares new test data with training data in order to solve the given machine learning problem.  Such algorithms search for the training data that are most similar to the test data and make predictions based on these similarities.  (From what I have learned, memory-based learning is used for supervised learning only.  Can you think of any memory-based algorithms for unsupervised learning?)

A distinguishing feature of memory-based learning is its storage of the entire training set.  This is computationally costly, especially if the training set is large – the storage itself is costly, and the complexity of the model grows with a larger data set.  However, it is advantageous because it uses less assumptions than parametric models, so it is adaptable to problems for which the assumptions may fail and no clear pattern is known ex ante.  (In contrast, parametric models like linear regression make generalizations about the training data; after building a model to predict the targets, the training data are discarded, so there is no need to store them.)  Thus, I recommend using memory-based learning algorithms when the data set is relatively small and there is no prior knowledge or information about the underlying patterns in the data.

Two classic examples of memory-based learning are K-nearest neighbours classification and K-nearest neighbours regression.

Machine Learning Lesson of the Day – Parametric vs. Non-Parametric Models

A machine learning algorithm can be classified as either parametric or non-parametric.

A parametric algorithm has a fixed number of parameters.  A parametric algorithm is computationally faster, but makes stronger assumptions about the data; the algorithm may work well if the assumptions turn out to be correct, but it may perform badly if the assumptions are wrong.  A common example of a parametric algorithm is linear regression.

In contrast, a non-parametric algorithm uses a flexible number of parameters, and the number of parameters often grows as it learns from more data.  A non-parametric algorithm is computationally slower, but makes fewer assumptions about the data.  A common example of a non-parametric algorithm is K-nearest neighbour.

To summarize, the trade-offs between parametric and non-parametric algorithms are in computational cost and accuracy.