# Machine Learning Lesson of the Day: Clustering, Density Estimation and Dimensionality Reduction

January 6, 2014 6 Comments

I struggle to categorize **unsupervised learning**. It is not an easily defined field, and it is also hard to find generalizations of techniques that are exhaustive and mutually exclusive.

Nonetheless, here are some categories of unsupervised learning that cover many of its commonly used techniques. I learned this categorization from Mathematical Monk, who posted a great set of videos on machine learning on Youtube.

**Clustering**: Categorize the observed variables into groups that maximize some similarity criterion, or, equivalently, minimize some dissimilarity criterion.- Example: K-Means Clustering

**Density Estimation**: Use statistical models to find an underlying probability distribution that gives rise to the observed variables.- Example: Kernel Density Estimation (Theory and Application)
- Example: Mixture models. Normal (or Gaussian) mixture models are especially popular.

**Dimensionality Reduction**: Find a smaller set of variables that captures the essential variations or patterns of the observed variables. This smaller set of variables may be just a subset of the observed variables, or it may be a set of new variables that better capture the underlying variation of the observed variables.- Example: Principal component analysis

Are there any other categories that you can think of? How would you categorize hidden Markov models? Your input is welcomed and appreciated in the comments!

Thanks for the great blog. You might want to add independent components analysis as well in “dimensionality reduction”.

Thanks for the positive feedback and suggestion, Arman! I don’t know anything about independent component analysis, so I’ll research that and write a post about it.

Reblogged this on MS-neuroimager and commented:

This blog touches on machine learning and data mining methods. I have been very much thinking on writing my PhD thesis with my supervisor to redefine MS subtype and machine learning is a field that seems really promising. More posts on this soon.

Two other algorithms that we consider as Unsupervised learning are

1) Sequence Mining, Association Rules, Finding frequent item sets,

2) Finding Similar items ( e.g. Recommendations)

Hi Srinath,

I don’t want to become too rigid about definitions, but, just to provide some food for thought, I think that recommender systems are supervised learning. They used ratings from past users to recommend items for a new user with similar preferences. The recommended items for the new users are the targets. That sounds like supervised learning.

Anyway, there is no need to debate this too deeply; it’s way more fun to explore what recommender systems are and how they function.

Thanks for commenting!

Agree, from the use case point of view (I guess I picked a wrong usecase) . However, techniques used such as LSH and SVD are more unsupervised style methods.