Machine Learning Lesson of the Day: Clustering, Density Estimation and Dimensionality Reduction

I struggle to categorize unsupervised learning.  It is not an easily defined field, and it is also hard to find generalizations of techniques that are exhaustive and mutually exclusive.

Nonetheless, here are some categories of unsupervised learning that cover many of its commonly used techniques.  I learned this categorization from Mathematical Monk, who posted a great set of videos on machine learning on Youtube.

  • Clustering: Categorize the observed variables X_1, X_2, ..., X_p into groups that maximize some similarity criterion, or, equivalently, minimize some dissimilarity criterion.
  • Density Estimation: Use statistical models to find an underlying probability distribution that gives rise to the observed variables.
  • Dimensionality Reduction: Find a smaller set of variables that captures the essential variations or patterns of the observed variables.  This smaller set of variables may be just a subset of the observed variables, or it may be a set of new variables that better capture the underlying variation of the observed variables.

Are there any other categories that you can think of?  How would you categorize hidden Markov models?  Your input is welcomed and appreciated in the comments!

Advertisements