Cluster-Based Anomaly Detection

Cluster-based methods involve grouping similar data points into clusters and identifying anomalies as data points that do not belong to any cluster or belong to small clusters.
The kmeans function in base R or the cluster package can be used for cluster-based anomaly detection.

R

# Generate some example data
set.seed(123)
data <- matrix(rnorm(200), ncol = 2)
 
# Perform k-means clustering
kmeans_result <- kmeans(data, centers = 3)
 
# Print the clustering result
print(kmeans_result)
 
# Identify anomalies based on cluster membership
anomalies <- which(kmeans_result$cluster == 1)
 
# Print the indices of potential anomalies
print(anomalies)

Output:

K-means clustering with 3 clusters of sizes 38, 29, 33
Cluster means:
         [,1]       [,2]
1 -0.66333772 -0.6219885
2 -0.02025692  1.0093022
3  1.05560227 -0.4966328
Clustering vector:
  [1] 1 2 3 1 1 3 3 1 1 2 3 2 3 1 2 3 3 1 3 1 1 1 1 1 2 1 3 2 1 3 2 2 3 3 3 2 3 2 2 1
 [41] 2 1 1 3 3 1 1 2 2 1 2 2 2 3 1 3 1 3 2 3 2 1 1 2 1 2 2 1 3 3 1 1 3 2 1 3 1 1 2 1
 [81] 1 2 1 3 1 3 2 3 2 3 3 3 2 1 3 2 3 3 1 1
Within cluster sum of squares by cluster:
[1] 23.92627 22.26036 24.96196
 (between_SS / total_SS =  59.4 %)
Available components:
[1] "cluster"      "centers"      "totss"        "withinss"     "tot.withinss"
[6] "betweenss"    "size"         "iter"         "ifault"      
 [1]   1   4   5   8   9  14  18  20  21  22  23  24  26  29  40  42  43  46  47  50
[21]  55  57  62  63  65  68  71  72  75  77  78  80  81  83  85  94  99 100

Anomaly Detection Using R

Anomaly detection is a critical aspect of data analysis, allowing us to identify unusual patterns, outliers, or abnormalities within datasets. It plays a pivotal role across various domains such as finance, cybersecurity, healthcare, and more.

Tags:

#Geeks Premier League 2023 #Geeks Premier League #R Language

2. Density Based Anamoly Detection

4. Bayesian Network Anomaly Detection

Cluster-Based Anomaly Detection

R

Anomaly Detection Using R

Similar Reads