K-Means Parameter Selection
In my last post, I provided a 101 on the K-means clustering algorithm. In case you haven't had a chance to read the post this link will take you to it. One question you might be wondering after reading the post is how to determine out the optimal number of clusters for k-means. That's exactly what this post is all about. At the end of this post you will have two techniques in your back-pocket that you can use to determine the optimal value for k in K-means.
Elbow Method
The first technique is the simplest of the two and is called the Elbow method. The Elbow method involves the following steps:
- Perform K means clustering for a range of values for k.
- After each clustering, compute the sum of squared error ( or SSE for short) of each cluster. -- More on that in a bit
- Create a line plot of the SSE for each value of k. This plot is called the elbow plot
- Look at the elbow plot and find the value of k where the SSE begins to decrease linearly. That value is the optimal number of clusters in your dataset.
So, what is the Sum of Squared Error?
The cluster sum of squared error is the average distance each data point is from the centroid. Calculating this involves the following steps:
- For each cluster and centroid produced by k-means
- Calculate the difference between each data point and its cluster centroid. Square the difference. We'll call the result e
- Compute the sum of all the differences. Let's call the result t
- Next we will add up the total distances computed for each cluster. The result of the calculation is the sum of squared error.
It is best summarized with this formula:

The Intuition of the Elbow Plot
When you plot the SSE for each value of k, you will immediately see why this this plot is called the Elbow plot. It's because the plot looks like an elbow!

As the number of clusters increases, the sum of squared error decreases. Our optimal number of clusters for k means will be a number where the SSE is small. If this number is too small though, we will end up with too many clusters. If this number is too big, then we will end up with too few clusters to make sense of the data we have. The purpose of the elbow plot is to help us find the value of k where we start to see diminishing returns. This point is our optimal value for k. It just so happens that this point on the plot is the elbow part of the arm.
As you can see from the plot above, this elbow appears to occur at k = 3, making this the optimal number of clusters.
But what happens when you have a plot with a fairly smooth or linear curve and no obvious elbow point, such as the one below.

This is where the next method I will show you comes in handy.
Introducing the Sihouette Method
The Sihouette method involves computing coefficients for each point in the dataset. These coefficients measure how similar a point is to its assigned cluster compared to other clusters. The value of each coefficient range between -1 and 1. Coefficients on the high end of the range indicate that the data point is well matched to its assigned cluster and poorly matched to other clusters. Coefficients on the low end of the range indicate that the data point is poorly matched to its cluster compared to other clusters.
Let's go through how to compute a sihouette coefficient for a simple data point:
- First we calculate the average distance the data point is from all other points in the cluster. We'll call this number a.
- Next we calculate the average distance of that data point from points in the cluster closest to its assigned cluster. Let's call this number b.
- Finally we take the two measurements we computed and calculate the coefficient by subtracting a from b and dividing by the larger of the two numbers.
This process is best summarized with this formula:

If most of the data points have high coefficient values, then the number of clusters used is optimal. If, on the other hand there are many data point that have low coefficient values, this indicates that either two many or too few clusters.
A sihouette plot can be used to visually inspect how close each point in one cluster is to points in neighboring clusters.

The x axis in the plot displays the sihouette coefficient values. The y axis displays the labels for each cluster. The red dashed line is the average sihouette coefficient, computed by averaging all the coefficients in the dataset.
Here's some rules of thumb to follow when analyzing a sihouette plot:
- If all the cluster plots is beyond the average sihoutte score, have mostly uniform thickness, and do not have wide fluctuation in size, that's a good indication that the number of clusters used is optimal.

- The number of clusters is sub optimal if you observe either of the following:
- The plot for a cluster falls below the average coefficient. Such as the plot from this link: https://vitalflux.com/wp-content/uploads/2020/09/Screenshot-2020-09-16-at-11.59.52-AM.png
- There are wide fluctuations in the size and thickness of the cluster plots.

You have just learned two methods for choosing the optimal number of clusters to form for k-means. If you enjoyed this post, subscribe to my blog to keep abreast of more posts like this.