Machine Learning Lesson of the Day – Babies and Non-Statisticians Practice Unsupervised Learning All the Time!

My recent lesson on unsupervised learning may make it seem like a rather esoteric field, with attempts to categorize it using words like “clustering“, “density estimation“, or “dimensionality reduction“.  However, unsupervised learning is actually how we as human beings often learn about the world that we live in – whether you are a baby learning what to eat or someone reading this blog.

  • Babies use their mouths and their sense of taste to explore the world, and they can probably determine what satisfies their hunger and what doesn’t pretty quickly.  As they expose themselves to different objects – a formula bottle, a pacifier, a mother’s breast, their own fingers – their taste and digestive system are recognizing these inputs and detecting patterns of what satisfies their hunger and what doesn’t.  This all happens before they even fully understand what “food” or “hunger” means.  This will probably happen before someone says “This is food” to them and they have the language capacity to know what those 3 words mean.
    • When a baby finall realizes what hunger feels like and develops the initiative to find something to eat, then that becomes a supervised learning problem: What attributes about an object will help me to determine if it’s food or not?
  • I recent wrote a page called “About this Blog” to categorize the different types of posts that I have written on this blog so far.  I did not aim to predict anything about any blog post; I simply wanted to organize the 50-plus blog posts into a few categories and make it easier for you to find them.  I ultimately clustered my blog posts into 4 mutually exclusive categories (now with some overlaps).  You can think of each blog post as a vector-valued input, and I chose 2 elements – the length and the topic – of each vector to find a way to group them into classes that are very similar in length and topic within each class and very different in length and topic between the classes.  (I used those 2 elements – or features – to maximize the similarities within each category and minimized the dissimilarities between the 4 categories.)  There were other features that I could have used – whether it had an image (binary feature), the number of colours of the fonts (integer-valued feature), the time of publication of the post (continuous feature) – but length and topic were sufficient for me to arrive at the 4 categories of “Tutorials”, “Lessons”, “Advice”, and “Notifications about Presentations and Appearances at Upcoming Events”.
Advertisements

Presentation Slides: Machine Learning, Predictive Modelling, and Pattern Recognition in Business Analytics

I recently delivered a presentation entitled “Using Advanced Predictive Modelling and Pattern Recognition in Business Analytics” at the Statistical Society of Canada’s (SSC’s) Southern Ontario Regional Association (SORA) Business Analytics Seminar Series.  In this presentation, I

– discussed how traditional statistical techniques often fail in analyzing large data sets

– defined and described machine learning, supervised learning, unsupervised learning, and the many classes of techniques within these fields, as well as common examples in business analytics to illustrate these concepts

– introduced partial least squares regression and bootstrap forest (or random forest) as two examples of supervised learning (0r predictive modelling) techniques that can effectively overcome the common failures of traditional statistical techniques and can be easily implemented in JMP

– illustrated how partial least squares regression and bootstrap forest were successfully used to solve some major problems for 2 different clients at Predictum, where I currently work as a statistician

Read more of this post