Machine Learning Lesson of the Day: Clustering, Density Estimation and Dimensionality Reduction

I struggle to categorize unsupervised learning.  It is not an easily defined field, and it is also hard to find generalizations of techniques that are exhaustive and mutually exclusive.

Nonetheless, here are some categories of unsupervised learning that cover many of its commonly used techniques.  I learned this categorization from Mathematical Monk, who posted a great set of videos on machine learning on Youtube.

  • Clustering: Categorize the observed variables X_1, X_2, ..., X_p into groups that maximize some similarity criterion, or, equivalently, minimize some dissimilarity criterion.
  • Density Estimation: Use statistical models to find an underlying probability distribution that gives rise to the observed variables.
  • Dimensionality Reduction: Find a smaller set of variables that captures the essential variations or patterns of the observed variables.  This smaller set of variables may be just a subset of the observed variables, or it may be a set of new variables that better capture the underlying variation of the observed variables.

Are there any other categories that you can think of?  How would you categorize hidden Markov models?  Your input is welcomed and appreciated in the comments!

6 Responses to Machine Learning Lesson of the Day: Clustering, Density Estimation and Dimensionality Reduction

  1. armaneshaghi says:

    Thanks for the great blog. You might want to add independent components analysis as well in “dimensionality reduction”.

  2. armaneshaghi says:

    Reblogged this on MS-neuroimager and commented:
    This blog touches on machine learning and data mining methods. I have been very much thinking on writing my PhD thesis with my supervisor to redefine MS subtype and machine learning is a field that seems really promising. More posts on this soon.

  3. Two other algorithms that we consider as Unsupervised learning are
    1) Sequence Mining, Association Rules, Finding frequent item sets,
    2) Finding Similar items ( e.g. Recommendations)

    • Hi Srinath,

      I don’t want to become too rigid about definitions, but, just to provide some food for thought, I think that recommender systems are supervised learning. They used ratings from past users to recommend items for a new user with similar preferences. The recommended items for the new users are the targets. That sounds like supervised learning.

      Anyway, there is no need to debate this too deeply; it’s way more fun to explore what recommender systems are and how they function.

      Thanks for commenting!

Your thoughtful comments are much appreciated!