← Displaying Isotopic Abundance Percentages with Bar Charts and Pie Charts

Presentation Slides – Overcoming Multicollinearity and Overfitting with Partial Least Squares Regression in JMP and SAS →

Presentation Slides – Finding Patterns in Data with K-Means Clustering in JMP and SAS

February 19, 2013 Leave a comment

My slides on K-means clustering at the Toronto Area SAS Society (TASS) meeting on December 14, 2012, can be found here.

This image is slightly enhanced from an image created by Weston.pace from Wikimedia Commons.

My Presentation on K-Means Clustering

I was very pleasured to be invited for the second time by the Toronto Area SAS Society (TASS) to deliver a presentation on machine learning. (I previously presented on partial least squares regression.) At its recent meeting on December 14, 2012, I introduced an unsupervised learning technique called K-means clustering.

I first defined clustering as a set of techniques for identifying groups of objects by maximizing a similarity criterion or, equivalently, minimizing a dissimilarity criterion. I then defined K-means clustering specifically as a clustering technique that uses Euclidean proximity to a group mean as its similarity criterion. I illustrated how this technique works with a simple 2-dimensional example; you can follow along this example in the slides by watching the sequence of images of the clusters toward convergence. As with many other machine learning techniques, some arbitrary decisions need to be made to initiate the algorithm for K-means clustering:

How many clusters should there be?
What is the mean of each cluster?

I provided some guidelines on how to make these decisions in these slides.

K-means clustering has its limitations, and I raised cautions about when this technique is most appropriate. Finally, I illustrated how this technique can be implemented in SAS and JMP. JMP has 2 particularly good features for K-means clustering:

it uses a quantitative measure called the cubic clustering criterion (CCC) to compare different numbers of clusters (overcoming one of the two long-standing questions about K-means clustering with an objective, albeit imperfect, criterion)
users can compare the performances of multiple numbers of clusters at once using the CCC

As always, I encourage everybody to attend their local SAS Users Group meetings to learn from and network with other analytics professionals! To everybody in Toronto: See you at the next TASS meeting!

Filed under Machine Learning, Presentations & Appearances Tagged with clustering, JMP, k-means clustering, machine learning, SAS, TASS, toronto, unsupervised learning

	Eric Cai - The Chemi… on Convert multiple variables bet…
	Jack on Convert multiple variables bet…
	Eric Cai - The Chemi… on Getting the names, types, form…
	Emily V on Getting the names, types, form…
	Lauren McClain on Convert multiple variables bet…
	Eric Cai - The Chemi… on Convert multiple variables bet…
	Lauren McClain on Convert multiple variables bet…
	Eric Cai - The Chemi… on Exploratory Data Analysis: Com…
	CK on Exploratory Data Analysis: Com…
	Eric Cai - The Chemi… on Video Tutorial: Breaking Down…

The Chemical Statistician

Presentation Slides – Finding Patterns in Data with K-Means Clustering in JMP and SAS

My Presentation on K-Means Clustering

Your thoughtful comments are much appreciated! Cancel reply

Eric’s Twitter Feed (@chemstateric)

Recent Comments

Popular Topics

Recent Posts

About Eric

Blogs and Web Sites That I Like to Read

Archives

Categories

The Chemical Statistician

Presentation Slides – Finding Patterns in Data with K-Means Clustering in JMP and SAS

My Presentation on K-Means Clustering

Share this:

Related

Your thoughtful comments are much appreciated! Cancel reply

Eric’s Twitter Feed (@chemstateric)

Recent Comments

Popular Topics

Recent Posts

About Eric

Blogs and Web Sites That I Like to Read

Archives

Categories