← Applied Statistics Lesson of the Day: Sample Size and Replication in Experimental Design

Machine Learning Lesson of the Day – Using Validation to Assess Predictive Accuracy in Supervised Learning →

Machine Learning Lesson of the Day: Clustering, Density Estimation and Dimensionality Reduction

January 6, 2014 6 Comments

I struggle to categorize unsupervised learning. It is n ot an easily defined field, and it is also hard to find generalizations of techniques that are exhaustive and mutually exclusive.

Nonetheless, here are some categories of unsupervised learning that cover many of its commonly used techniques. I learned this categorization from Mathematical Monk, who posted a great set of videos on machine learning on Youtube.

Clustering: Categorize the observed variables into groups that maximize some similarity criterion, or, equivalently, minimize some dissimilarity criterion.
- Example: K-Means Clustering
Density Estimation: Use statistical models to find an underlying probability distribution that gives rise to the observed variables.
- Example: Kernel Density Estimation (Theory and Application)
- Example: Mixture models. Normal (or Gaussian) mixture models are especially popular.
Dimensionality Reduction: Find a smaller set of variables that captures the essential variations or patterns of the observed variables. This smaller set of variables may be just a subset of the observed variables, or it may be a set of new variables that better capture the underlying variation of the observed variables.
- Example: Principal component analysis

Are there any other categories that you can think of? How would you categorize hidden Markov models? Your input is welcomed and appreciated in the comments!

Filed under Machine Learning, Machine Learning Lesson of the Day, Statistics Tagged with clustering, gaussian mixture model, k-means clustering, kernel density estimation, machine learning, mixture model, normal mixture model, principal component analysis

6 Responses to Machine Learning Lesson of the Day: Clustering, Density Estimation and Dimensionality Reduction

armaneshaghi says:

January 8, 2014 at 12:38 am

Thanks for the great blog. You might want to add independent components analysis as well in “dimensionality reduction”.

Reply
- Eric Cai - The Chemical Statistician says:
  
  January 8, 2014 at 11:24 pm
  
  Thanks for the positive feedback and suggestion, Arman! I don’t know anything about independent component analysis, so I’ll research that and write a post about it.
  
  Reply
armaneshaghi says:

January 8, 2014 at 12:40 am

Reblogged this on MS-neuroimager and commented:
This blog touches on machine learning and data mining methods. I have been very much thinking on writing my PhD thesis with my supervisor to redefine MS subtype and machine learning is a field that seems really promising. More posts on this soon.

Reply
Srinath Perera says:

September 21, 2015 at 3:34 am

Two other algorithms that we consider as Unsupervised learning are
1) Sequence Mining, Association Rules, Finding frequent item sets,
2) Finding Similar items ( e.g. Recommendations)

Reply
- Eric Cai - The Chemical Statistician says:
  
  September 21, 2015 at 11:23 am
  
  Hi Srinath,
  
  I don’t want to become too rigid about definitions, but, just to provide some food for thought, I think that recommender systems are supervised learning. They used ratings from past users to recommend items for a new user with similar preferences. The recommended items for the new users are the targets. That sounds like supervised learning.
  
  Anyway, there is no need to debate this too deeply; it’s way more fun to explore what recommender systems are and how they function.
  
  Thanks for commenting!
  
  Reply
  - Srinath Perera says:
    
    September 21, 2015 at 7:09 pm
    
    Agree, from the use case point of view (I guess I picked a wrong usecase) . However, techniques used such as LSH and SVD are more unsupervised style methods.

	Eric Cai - The Chemi… on Convert multiple variables bet…
	Jack on Convert multiple variables bet…
	Eric Cai - The Chemi… on Getting the names, types, form…
	Emily V on Getting the names, types, form…
	Lauren McClain on Convert multiple variables bet…
	Eric Cai - The Chemi… on Convert multiple variables bet…
	Lauren McClain on Convert multiple variables bet…
	Eric Cai - The Chemi… on Exploratory Data Analysis: Com…
	CK on Exploratory Data Analysis: Com…
	Eric Cai - The Chemi… on Video Tutorial: Breaking Down…

The Chemical Statistician

Machine Learning Lesson of the Day: Clustering, Density Estimation and Dimensionality Reduction

6 Responses to Machine Learning Lesson of the Day: Clustering, Density Estimation and Dimensionality Reduction

Your thoughtful comments are much appreciated! Cancel reply

Eric’s Twitter Feed (@chemstateric)

Recent Comments

Popular Topics

Recent Posts

About Eric

Blogs and Web Sites That I Like to Read

Archives

Categories

The Chemical Statistician

Machine Learning Lesson of the Day: Clustering, Density Estimation and Dimensionality Reduction

Share this:

Related

6 Responses to Machine Learning Lesson of the Day: Clustering, Density Estimation and Dimensionality Reduction

Your thoughtful comments are much appreciated! Cancel reply

Eric’s Twitter Feed (@chemstateric)

Recent Comments

Popular Topics

Recent Posts

About Eric

Blogs and Web Sites That I Like to Read

Archives

Categories