K-Means Clustering
An unsupervised non-linear approach called K Means Clustering in R programming organises data based on similarity or similar groups. Specifically, it aims to divide the observations into a predetermined number of clusters. Data is segmented in order to group each training example into a segment known as a cluster. In the unsupervised method, a lot of emphasis is placed on providing raw data while also spending a lot of money on manual review to determine relevance. It is utilised in a number of industries, including banking, healthcare, retail, and media.
K-Means Clustering in R
K-means clustering is a partitioning method that aims to divide data into a pre-specified number of clusters, denoted as “k.” Each data point belongs to the cluster with the nearest mean (centroid).
R
# Load the required libraries library (ggplot2) library (cluster) # Load the "mtcars" dataset data (mtcars) head (mtcars) |
Output:
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
Plot the clustering
R
# Perform k-means clustering (e.g., k=3 clusters) kmeans_model <- kmeans (mtcars[, c ( "mpg" , "hp" )], centers = 3) # Create a scatterplot to visualize the clusters ggplot (data = mtcars, aes (x = mpg, y = hp, color = as.factor (kmeans_model$cluster))) + geom_point (size = 4) + scale_color_manual (values = c ( "#1f77b4" , "#ff7f0e" , "#2ca02c" )) + labs (title = "Car Segmentation Based on MPG and Horsepower" , x = "Miles per Gallon (mpg)" , y = "Horsepower (hp)" ) + theme_minimal () |
Output:
We use ggplot2 to create a scatterplot.
- The points are colored according to the cluster assignments.
- The color scheme is customized to make the plot more attractive.
- We add a title and axis labels to improve the plot’s informativeness.
- We use a minimal theme for a clean and attractive appearance.
This code will create an attractive scatterplot visualizing customer segments based on annual income and spending score.
Cluster Graph in R
R’s cluster graph functionality can be a useful tool for visualizing data and seeing patterns within it. In disciplines including biology, the social sciences, and data analysis, cluster graphs are frequently used to group together related data points. In this article, we’ll demonstrate how to display a cluster graph in R by combining the ggplot2 package for data analysis and visualization with the ggraph tool for graph visualization.