K-Means Clustering

An unsupervised non-linear approach called K Means Clustering in R programming organises data based on similarity or similar groups. Specifically, it aims to divide the observations into a predetermined number of clusters. Data is segmented in order to group each training example into a segment known as a cluster. In the unsupervised method, a lot of emphasis is placed on providing raw data while also spending a lot of money on manual review to determine relevance. It is utilised in a number of industries, including banking, healthcare, retail, and media.

K-Means Clustering in R

K-means clustering is a partitioning method that aims to divide data into a pre-specified number of clusters, denoted as “k.” Each data point belongs to the cluster with the nearest mean (centroid).

R




# Load the required libraries
library(ggplot2)
library(cluster)
 
# Load the "mtcars" dataset
data(mtcars)
head(mtcars)


Output:

                   mpg cyl disp  hp drat    wt  qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1

Plot the clustering

R




# Perform k-means clustering (e.g., k=3 clusters)
kmeans_model <- kmeans(mtcars[, c("mpg", "hp")], centers = 3)
# Create a scatterplot to visualize the clusters
ggplot(data = mtcars, aes(x = mpg, y = hp, color = as.factor(kmeans_model$cluster))) +
  geom_point(size = 4) +
  scale_color_manual(values = c("#1f77b4", "#ff7f0e", "#2ca02c")) +
  labs(title = "Car Segmentation Based on MPG and Horsepower",
       x = "Miles per Gallon (mpg)",
       y = "Horsepower (hp)") +
  theme_minimal()


Output:

KMeans Clustering using R

We use ggplot2 to create a scatterplot.

  • The points are colored according to the cluster assignments.
  • The color scheme is customized to make the plot more attractive.
  • We add a title and axis labels to improve the plot’s informativeness.
  • We use a minimal theme for a clean and attractive appearance.

This code will create an attractive scatterplot visualizing customer segments based on annual income and spending score.

Cluster Graph in R

R’s cluster graph functionality can be a useful tool for visualizing data and seeing patterns within it. In disciplines including biology, the social sciences, and data analysis, cluster graphs are frequently used to group together related data points. In this article, we’ll demonstrate how to display a cluster graph in R by combining the ggplot2 package for data analysis and visualization with the ggraph tool for graph visualization.

Similar Reads

Cluster Analysis

Cluster analysis is a technique used in data science and statistics to group similar data points together. It is commonly applied in various fields such as biology, marketing, and social sciences for tasks like customer segmentation, species classification, and identifying patterns in data. Cluster analysis algorithms aim to find meaningful clusters in your data based on similarity or dissimilarity measures....

Hierarchical Clustering

A bottom-up method of clustering is hierarchical clustering. By gradually merging or separating clusters, it builds a hierarchy of clusters. A dendrogram, a structure like a tree, is frequently used to represent the outcome. The dendrogram can be clipped at a given height to produce the required number of clusters....

K-Means Clustering

...

Cluster graph on USArrest dataset

An unsupervised non-linear approach called K Means Clustering in R programming organises data based on similarity or similar groups. Specifically, it aims to divide the observations into a predetermined number of clusters. Data is segmented in order to group each training example into a segment known as a cluster. In the unsupervised method, a lot of emphasis is placed on providing raw data while also spending a lot of money on manual review to determine relevance. It is utilised in a number of industries, including banking, healthcare, retail, and media....

Conclusion

...