Step-by-step visualization of K-Means

K-means clustering is a popular unsupervised learning algorithm that partitions a dataset into K distinct clusters. Visualizing the K-means process can help us understand how the algorithm works and how it converges to a solution.

Understanding the K-Means Algorithm

Before we delve into the visualization, let’s recap the key steps of the K-means algorithm:

  1. Initialization: Randomly select K data points as initial centroids.
  2. Assignment: Assign each data point to the nearest centroid based on Euclidean distance.  
  3. Update: Calculate the new centroids as the mean of all data points assigned to each cluster.
  4. Repeat: Iterate steps 2 and 3 until the centroids converge or a maximum number of iterations is reached.

Visualization Steps

To visualize the K-means process, we can plot the data points and the centroids at each iteration. This will help us observe how the centroids move and how the clusters are formed.

Step 1: Initialization

  • Randomly select K data points as initial centroids.
  • Plot the data points and the initial centroids on a scatter plot.

Step 2: Assignment

  • Calculate the Euclidean distance between each data point and each centroid.
  • Assign each data point to the nearest centroid.
  • Color-code the data points based on their assigned clusters.

Step 3: Update

  • Calculate the new centroids as the mean of all data points assigned to each cluster.
  • Plot the updated centroids on the scatter plot.

Step 4: Repeat

  • Repeat steps 2 and 3 until convergence or a maximum number of iterations is reached.
  • At each iteration, update the plot to show the new cluster assignments and centroid positions.

Example Visualization

Consider a dataset of two-dimensional points:

Data points: (2, 3), (4, 5), (6, 7), (8, 9), (1, 2)
K = 2
  1. Initialization: Randomly select two data points as initial centroids: (2, 3) and (8, 9).
  2. Assignment:
    • (2, 3) is closer to (2, 3) than (8, 9), so it is assigned to cluster 1.
    • (4, 5) is closer to (2, 3) than (8, 9), so it is assigned to cluster 1.
    • (6, 7) is closer to (8, 9) than (2, 3), so it is assigned to cluster 2.
    • (8, 9) is closer to (8, 9) than (2, 3), so it is assigned to cluster 2.
    • (1, 2) is closer to (2, 3) than (8, 9), so it is assigned to cluster 1.
  3. Update:
    • Centroid of cluster 1: (2 + 4 + 1) / 3 = 2.33, (3 + 5 + 2) / 3 = 3.33
    • Centroid of cluster 2: (6 + 8) / 2 = 7, (7 + 9) / 2 = 8
  4. Repeat: Continue iterating steps 2 and 3, updating the plot at each iteration, until convergence or a maximum number of iterations is reached.

By visualizing the K-means process in this way, we can gain a better understanding of how the algorithm converges to a solution and how the clusters are formed.

Tips on pacing yourself during K-Means implementation
Scenarios where K-Means might not work

Get industry recognized certification – Contact us

keyboard_arrow_up
Open chat
Need help?
Hello 👋
Can we help you?