What Cluster Are You From?

This week and I had the pleasure of presenting to the Division of Rheumatology Research Rounds – University of Toronto. They were a fantastic audience who asked questions and appeared to be very engaged. Shout out to the Rheumatology gang!


So, I was asked to talk about a statistical methodology called Cluster Analysis. I thought I would start a short series on the topic for you guys. Don’t worry I will keep the stats to a minimum as I always do!


Complex information can always be best recognized as patterns. The first picture below on the left certainly helps you realize that it is not a simple task to know someone at a glance.





Now, I guess it doesn’t help that many of you have never met me either! However, you can appreciate that things get a little easier when the same portrait is presented in the usual manner – upright! 












This is an interesting example where the information is identical, however, our ability to intuitively recognize a pattern (me!) appears to be restricted to situations that we are familiar with.








This intuition often fails miserably when abstract magnitudes (numbers!) are involved. I am certain most of us can relate to that. 


The good news is that with the advent of crazy powerful personal computers we can benefit from complex and resource intensive mathematical procedures to help us make sense of large scary looking data sets.




So, when would you use this kind of methodology you ask? I’ll tell you…


1 – Detection of subgroups/ clusters of entities (ie: items, subjects, users…) within your data set.


2 – Discovery of useful, possibly unexpected, patterns in data.




OK, time for some homework. Try to think of times when you could apply this kind of analysis. 


I’ll start you off with an example that you can relate to. Every time you go to YouTube and search for your favorite movie trailer you get a long list of other items on the right that YouTube thinks may be of interest to you. How do you think they do that? By taking into account things like keywords, popularity, and user browser history (and many, many more variables) and using cluster analysis of course! You and your interests belong to a cluster. Cool!


In this series, we will delve into this fun world of working with patterns in data. 




Now that you have peace of mind, listen to The Grapes of Wrath






See you in the blogosphere,


Pascal Tyrrell