# Machine Learning Lesson of the Day – The “No Free Lunch” Theorem

January 24, 2014 16 Comments

A **model** is a simplified representation of reality, and the **simplifications** are made to *discard unnecessary detail* and allow us to focus on the aspect of reality that we want to understand. These simplifications are grounded on **assumptions**; these assumptions may hold in some situations, but may not hold in other situations. This implies that a model that explains a certain situation well may fail in another situation. In both statistics and machine learning, we need to **check our assumptions** before relying on a model.

The **“No Free Lunch” theorem** states that there is no one model that works best for every problem. The assumptions of a great model for one problem may not hold for another problem, so it is common in machine learning to try multiple models and find one that works best for a particular problem. This is especially true in supervised learning; validation or cross-validation is commonly used to assess the predictive accuracies of multiple models of varying complexity to find the best model. A model that works well could also be trained by multiple algorithms – for example, **linear regression** could be trained by the **normal equations** or by **gradient descent**.

Depending on the problem, it is important to assess the trade-offs between **speed**, **accuracy**, and **complexity** of different models and algorithms and find a model that works best for that particular problem.

Very good! I use Pareto to select an initial model with 20% of complexity that could represent 80% of the behavior under study. Then iterate in a cycle of improvement of the model while the ROI: %improvement_of_model /%complexity_improvement keeps greater than a prescribed limit. Ie, there is a time that to continue trying to improve the model becomes unprofitable.

Hi Pablo,

I’m sorry, I don’t underestand your comment.

– What do you mean by “Pareto”?

– Can you clarify what you mean by

“Then iterate in a cycle of improvement of the model while the ROI: %improvement_of_model /%complexity_improvement keeps greater than a prescribed limit.”

Thanks,

Eric

Hi Eric,

By “Pareto” I mean the Pareto principle or 80-20 rule: http://en.wikipedia.org/wiki/Pareto_principle.

Following that principle it is wise to begin with the simplest useful model: with the 20% of (total) complexity, this simple model explains the 80% of the dynamics of the system that we’re pretending to model. If we want to improve it in, let say, 10% in accuracy, that will cost us no 10% in complexity increase but much more, for example, 50%. Of course, Pareto’s principle it’s only another simplified model in itself, so its has limits. But the main idea is: “80% of the effects come from 20% of the causes”.

Now we can define ROI (return of invest) of improvement of the model as: %improvement_of_model / %complexity_increase. And proceed improving the model in iterations while ROI>prescribed value: if ROI is small, stop looking for a better model.

Thanks

Pablo

Pingback: The No Free Lunch Theorem - Kriss Jessop

Pingback: “No Free Lunch” Theorem – Matthew Clark

Pingback: No Free Lunch Theorem – dmarcoweb

Pingback: No Free Lunch Theorems | naturalcom

Pingback: Natural computing week 1

Pingback: Why I’m not sold on machine learning in autonomous security – TOP CYBER NEWS

Pingback: Why I’m not sold on machine learning in autonomous security

Pingback: Why Im not sold on machine learning in autonomous security – Breaching News – Better safe than sorry

Pingback: IDG Contributor Network: Why I’m not sold on machine learning in autonomous security - f1tym1

Pingback: Why I’m not sold on machine learning in autonomous security – Hacking & Cyber Security

Pingback: Why I’m not Sold on Machine Learning in Autonomous Security: Some Hard Realities on the Limitations of Machine Learning in Autonomous netsec – The Duchstein Blog

Pingback: The Gradient Boosters I: The Math Heavy Primer to Gradient Boosting Algorithm – Deep & Shallow

Pingback: The Good Old Gradient Boosting – Data Science Austria