MiWORD of the Day is… Silhouette Score!

Silhouette score… is that some sort of way to measure whose silhouette looks better? Or how identifiable the silhouettes are? Well… kind of! It turns out that in statistics, silhouette score is a measure for how “good” a clustering algorithm is. It considers two factors: cohesion and separation. Particularly, how compact is the cluster? And how separated is the cluster from other clusters?

Let’s say you asked your friend to group a bunch of cats into 3 clusters based on where they were sitting on the floor, because you wanted to know whether the cats sit in groups or if they just sit randomly. How can we determine how “good” your friend clustered them? Let’s zoom in to one specific cat who happens to be placed in Cluster 1. We first look at intra-cluster distance, which would be the mean distance to all other cats in Cluster 1. We then take the mean nearest-cluster distance, which would be the distance between the cat and the nearest cluster the cat is not a part of, either Cluster 2 or 3, in this case.

To have a “good” clustering algorithm, we want to minimize the intra-cluster distance and maximize the mean nearest-cluster distance. Together, this can be used to calculate our silhouette score for one cat. Then, we can repeat this for each cat and average the score for all cats to get the overall silhouette score. Silhouette score ranges from -1 to +1, and the higher the score, the better! A high score indicates that the cats are generally similar to the other cats in their clusters and distinct from the cats in other clusters. A score of 0 means that clusters are overlapping. So, if it turns out that the cats were sitting in distinct groups and your friend is good at clustering, we’d expect a high silhouette score.

Now, to use it in a sentence!

Serious: I am unsure of how many clusters I should group my data into for k-means clustering… it seems like choosing 3 or 4 will give me the same silhouette score of 0.38!

Less serious (suggested to me by ChatGPT): I tried sorting my sock drawer by color, But it’s a bit tricky with all those shades of grey. I mean, I can’t even tell the difference between dark grey and mid grey. My sock drawer’s silhouette score is so low!

See you in the blogosphere!
Lucie Yang