KMeans Clustering with Iris Dataset

Decision Tree Algorithm with Iris Dataset

K-means clustering is an Unsupervised machine learning algorithm.

First, choose the clusters K
Randomly select k centroids from the whole dataset
Assign all points to the closest cluster centroid
Recompute centroids again for new clusters
now repeat steps 3 and 4 until centroids converge

Python3

wcss = [] 
  
for i in range(1, 11): 
    kmeans = KMeans(n_clusters=i, 
                    init='k-means++', 
                    max_iter=300, 
                    n_init=10, 
                    random_state=0) 
    kmeans.fit(x) 
    wcss.append(kmeans.inertia_) 
      
# from above array with help of elbow method 
#we can get no of cluster to provide. 
kmeans = KMeans(n_clusters=3, 
                init='k-means++', 
                max_iter=300, 
                n_init=10, 
                random_state=0) 
y_kmeans = kmeans.fit_predict(x) 

In the above code, we have used the elbow method to get the optimized value of k. If we plot a graph for it we get a value of 3.

Visualizing the Clusters

Python3

# Visualising the clusters 
cols = iris.columns 
plt.scatter(X.loc[y_kmeans == 0, cols[0]], 
            X.loc[y_kmeans == 0, cols[1]], 
            s=100, c='purple', 
            label='Iris-setosa') 
plt.scatter(X.loc[y_kmeans == 1, cols[0]], 
            X.loc[y_kmeans == 1, cols[1]], 
            s=100, c='orange', 
            label='Iris-versicolour') 
plt.scatter(X.loc[y_kmeans == 2, cols[0]], 
            X.loc[y_kmeans == 2, cols[1]], 
            s=100, c='green', 
            label='Iris-virginica') 
  
# Plotting the centroids of the clusters 
plt.scatter(kmeans.cluster_centers_[:, 0], 
            kmeans.cluster_centers_[:, 1], 
            s=100, c='red', 
            label='Centroids') 
  
plt.legend() 

Output:

Clusters obtained by using the K-means algorithm

Accuracy and Performance of Model

Now let’s check the performance of the model.

Python3

pd.crosstab(iris.target, y_kmeans)

Output:

As the algorithm is an unsupervised algorithm we don’t have test data here to check the performance of the model on it. Setosa class is clustered perfectly. While Versicolor has only 2 misclassifications. Class virginica is getting overlapped Versicolor hence there is 14 misclassifications.

Analyzing Decision Tree and K-means Clustering using Iris dataset

Iris Dataset is one of best know datasets in pattern recognition literature. This dataset contains 3 classes of 50 instances each, where each class refers to a type of iris plant. One class is linearly separable from the other 2 the latter are NOT linearly separable from each other.

Attribute Information:

Sepal Length in cm
Sepal Width in cm
Petal Length in cm
al Width in cm
Class:
- Iris Setosa
- Iris Versicolour
- Iris Virginica

Let’s perform Exploratory data analysis on the dataset to get our initial investigation right.

Tags:

#AI-ML-DS #Machine Learning #Machine Learning

Decision Tree Algorithm with Iris Dataset

KMeans Clustering with Iris Dataset

Python3

Visualizing the Clusters

Python3

Accuracy and Performance of Model

Python3

Analyzing Decision Tree and K-means Clustering using Iris dataset

Attribute Information:

Similar Reads