← Mathematical and Applied Statistics Lesson of the Day – The Central Limit Theorem Applies to the Sample Mean

Physical Chemistry Lesson of the Day – Effective Nuclear Charge →

Machine Learning Lesson of the Day – Overfitting and Underfitting

March 19, 2014 5 Comments

Overfitting occurs when a statistical model or machine learning algorithm captures the noise of the data. Intuitively, overfitting occurs when the model or the algorithm fits the data too well. Specifically, overfitting occurs if the model or algorithm shows low bias but high variance. Overfitting is often a result of an excessively complicated model, and it can be prevented by fitting multiple models and using validation or cross-validation to compare their predictive accuracies on test data.

Underfitting occurs when a statistical model or machine learning algorithm cannot capture the underlying trend of the data. Intuitively, underfitting occurs when the model or the algorithm does not fit the data well enough. Specifically, underfitting occurs if the model or algorithm shows low variance but high bias. Underfitting is often a result of an excessively simple model.

Both overfitting and underfitting lead to poor predictions on new data sets.

In my experience with statistics and machine learning, I don’t encounter underfitting very often. Data sets that are used for predictive modelling nowadays often come with too many predictors, not too few. Nonetheless, when building any model in machine learning for predictive modelling, use validation or cross-validation to assess predictive accuracy – whether you are trying to avoid overfitting or underfitting.

Filed under Machine Learning, Machine Learning Lesson of the Day, Predictive Modelling, Statistics Tagged with bias, cross-validation, machine learning, overfitting, predictive modelling, statistics, underfitting, validation, variance

5 Responses to Machine Learning Lesson of the Day – Overfitting and Underfitting

Pingback: If you did not already know: “Underfitting” | Data Analytics & R
Pingback: Top 50+ Machine learning interview questions and answers - OnlineTutorials.Today
Pingback: Top 50+ Machine learning interview questions and answers - 2019
Anslem Manoka says:

July 18, 2020 at 2:03 am

How can we determine the degree of a polynomial regression?
How can I know that a certain degree is the best fit?

Reply
- Eric Cai - The Chemical Statistician says:
  
  July 20, 2020 at 3:54 pm
  
  Hi Anslem – You should use multiple models with different degrees, and use validation or cross-validation to compare the predictive accuracy. If 2 models have the same predictive accuracy, then choose the simpler model.
  
  Reply

	Eric Cai - The Chemi… on Convert multiple variables bet…
	Jack on Convert multiple variables bet…
	Eric Cai - The Chemi… on Getting the names, types, form…
	Emily V on Getting the names, types, form…
	Lauren McClain on Convert multiple variables bet…
	Eric Cai - The Chemi… on Convert multiple variables bet…
	Lauren McClain on Convert multiple variables bet…
	Eric Cai - The Chemi… on Exploratory Data Analysis: Com…
	CK on Exploratory Data Analysis: Com…
	Eric Cai - The Chemi… on Video Tutorial: Breaking Down…

The Chemical Statistician

Machine Learning Lesson of the Day – Overfitting and Underfitting

5 Responses to Machine Learning Lesson of the Day – Overfitting and Underfitting

Your thoughtful comments are much appreciated! Cancel reply

Eric’s Twitter Feed (@chemstateric)

Recent Comments

Popular Topics

Recent Posts

About Eric

Blogs and Web Sites That I Like to Read

Archives

Categories

The Chemical Statistician

Machine Learning Lesson of the Day – Overfitting and Underfitting

Share this:

Related

5 Responses to Machine Learning Lesson of the Day – Overfitting and Underfitting

Your thoughtful comments are much appreciated! Cancel reply

Eric’s Twitter Feed (@chemstateric)

Recent Comments

Popular Topics

Recent Posts

About Eric

Blogs and Web Sites That I Like to Read

Archives

Categories