Machine Learning Lesson of the Day – The “No Free Lunch” Theorem

A model is a simplified representation of reality, and the simplifications are made to discard unnecessary detail and allow us to focus on the aspect of reality that we want to understand.  These simplifications are grounded on assumptions; these assumptions may hold in some situations, but may not hold in other situations.  This implies that a model that explains a certain situation well may fail in another situation.  In both statistics and machine learning, we need to check our assumptions before relying on a model.

The “No Free Lunch” theorem states that there is no one model that works best for every problem.  The assumptions of a great model for one problem may not hold for another problem, so it is common in machine learning to try multiple models and find one that works best for a particular problem.  This is especially true in supervised learning; validation or cross-validation is commonly used to assess the predictive accuracies of multiple models of varying complexity to find the best model.  A model that works well could also be trained by multiple algorithms – for example, linear regression could be trained by the normal equations or by gradient descent.

Depending on the problem, it is important to assess the trade-offs between speed, accuracy, and complexity of different models and algorithms and find a model that works best for that particular problem.

16 Responses to Machine Learning Lesson of the Day – The “No Free Lunch” Theorem

  1. Very good! I use Pareto to select an initial model with 20% of complexity that could represent 80% of the behavior under study. Then iterate in a cycle of improvement of the model while the ROI: %improvement_of_model /%complexity_improvement keeps greater than a prescribed limit. Ie, there is a time that to continue trying to improve the model becomes unprofitable.

    • Hi Pablo,

      I’m sorry, I don’t underestand your comment.
      – What do you mean by “Pareto”?
      – Can you clarify what you mean by

      “Then iterate in a cycle of improvement of the model while the ROI: %improvement_of_model /%complexity_improvement keeps greater than a prescribed limit.”

      Thanks,

      Eric

      • Hi Eric,

        By “Pareto” I mean the Pareto principle or 80-20 rule: http://en.wikipedia.org/wiki/Pareto_principle.

        Following that principle it is wise to begin with the simplest useful model: with the 20% of (total) complexity, this simple model explains the 80% of the dynamics of the system that we’re pretending to model. If we want to improve it in, let say, 10% in accuracy, that will cost us no 10% in complexity increase but much more, for example, 50%. Of course, Pareto’s principle it’s only another simplified model in itself, so its has limits. But the main idea is: “80% of the effects come from 20% of the causes”.

        Now we can define ROI (return of invest) of improvement of the model as: %improvement_of_model / %complexity_increase. And proceed improving the model in iterations while ROI>prescribed value: if ROI is small, stop looking for a better model.

        Thanks
        Pablo

  2. Pingback: The No Free Lunch Theorem - Kriss Jessop

  3. Pingback: “No Free Lunch” Theorem – Matthew Clark

  4. Pingback: No Free Lunch Theorem – dmarcoweb

  5. Pingback: No Free Lunch Theorems | naturalcom

  6. Pingback: Natural computing week 1

  7. Pingback: Why I’m not sold on machine learning in autonomous security – TOP CYBER NEWS

  8. Pingback: Why I’m not sold on machine learning in autonomous security

  9. Pingback: Why Im not sold on machine learning in autonomous security – Breaching News – Better safe than sorry

  10. Pingback: IDG Contributor Network: Why I’m not sold on machine learning in autonomous security - f1tym1

  11. Pingback: Why I’m not sold on machine learning in autonomous security – Hacking & Cyber Security

  12. Pingback: Why I’m not Sold on Machine Learning in Autonomous Security: Some Hard Realities on the Limitations of Machine Learning in Autonomous netsec – The Duchstein Blog

  13. Pingback: The Gradient Boosters I: The Math Heavy Primer to Gradient Boosting Algorithm – Deep & Shallow

  14. Pingback: The Good Old Gradient Boosting – Data Science Austria

Your thoughtful comments are much appreciated!