Cluster-Based Anomaly Detection
- Cluster-based methods involve grouping similar data points into clusters and identifying anomalies as data points that do not belong to any cluster or belong to small clusters.
- The kmeans function in base R or the cluster package can be used for cluster-based anomaly detection.
R
# Generate some example data set.seed (123) data <- matrix ( rnorm (200), ncol = 2) # Perform k-means clustering kmeans_result <- kmeans (data, centers = 3) # Print the clustering result print (kmeans_result) # Identify anomalies based on cluster membership anomalies <- which (kmeans_result$cluster == 1) # Print the indices of potential anomalies print (anomalies) |
Output:
K-means clustering with 3 clusters of sizes 38, 29, 33
Cluster means:
[,1] [,2]
1 -0.66333772 -0.6219885
2 -0.02025692 1.0093022
3 1.05560227 -0.4966328
Clustering vector:
[1] 1 2 3 1 1 3 3 1 1 2 3 2 3 1 2 3 3 1 3 1 1 1 1 1 2 1 3 2 1 3 2 2 3 3 3 2 3 2 2 1
[41] 2 1 1 3 3 1 1 2 2 1 2 2 2 3 1 3 1 3 2 3 2 1 1 2 1 2 2 1 3 3 1 1 3 2 1 3 1 1 2 1
[81] 1 2 1 3 1 3 2 3 2 3 3 3 2 1 3 2 3 3 1 1
Within cluster sum of squares by cluster:
[1] 23.92627 22.26036 24.96196
(between_SS / total_SS = 59.4 %)
Available components:
[1] "cluster" "centers" "totss" "withinss" "tot.withinss"
[6] "betweenss" "size" "iter" "ifault"
[1] 1 4 5 8 9 14 18 20 21 22 23 24 26 29 40 42 43 46 47 50
[21] 55 57 62 63 65 68 71 72 75 77 78 80 81 83 85 94 99 100
Anomaly Detection Using R
Anomaly detection is a critical aspect of data analysis, allowing us to identify unusual patterns, outliers, or abnormalities within datasets. It plays a pivotal role across various domains such as finance, cybersecurity, healthcare, and more.