What is Decision Tree Algorithm with Iris Dataset?

In this article, we will learn Decision Tree Algorithm with Iris Dataset,This free achine Learning tutorial for complete beginners will help you learn achine Learning from scratch.

Decision Tree Algorithm with Iris Dataset

Importing Libraries and Dataset

KMeans Clustering with Iris Dataset

A Decision Tree is one of the popular algorithms for classification and prediction tasks and also a supervised machine learning algorithm

It begins with all elements E as the root node.
On each iteration of the algorithm, it iterates through the very unused attribute of the set E and calculates (Entropy(H) or Gini Impurity) and Information gain(IG) of this attribute.
It then selects the attribute which has the smallest Gini Impurity or Largest Information gain.
The set E is then split by the selected attribute to produce a subset of the data.
The algorithm continues to recur on each subset, considering only attributes never selected before.
This depth of the tree stops once we get all nodes as pure nodes.

Python3

X = iris.iloc[:, :-2] 
y = iris.target 
X_train, X_test,\ 
    y_train, y_test = train_test_split(X, y, 
                                       test_size=0.33, 
                                       random_state=42) 
treemodel = DecisionTreeClassifier() 
treemodel.fit(X_train, y_train) 

Now let’s check the performance of the Decision tree model.

Python3

plt.figure(figsize=(15, 10)) 
tree.plot_tree(treemodel, filled=True) 
ypred = treemodel.predict(X_test) 
score = accuracy_score(ypred, y_test) 
print(score) 
print(classification_report(ypred, y_test)) 

Output:

0.98
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        19
           1       1.00      0.94      0.97        16
           2       0.94      1.00      0.97        15

    accuracy                           0.98        50
   macro avg       0.98      0.98      0.98        50
weighted avg       0.98      0.98      0.98        50

Analyzing Decision Tree formed by the model

One of the advantages of using decision trees over other models is decision trees are highly interpretable and feature selection is automatic hence proper analysis can be done on decision trees. By seeing the above tree we can interpret that.

If petal_length<2.45 then the output class will always be setosa.
After depth=2 we can see that we are doing unnecessary splitting because that would just lead to an increase in the variance of the model and thus overfitting.
After depth=2 majority class of flowers are Versicolor and on the right-hand side of the tree, the majority are virginica.
Hence there is no need of splitting after depth=2 as it would just lead to overfitting of the model

Analyzing Decision Tree and K-means Clustering using Iris dataset

Iris Dataset is one of best know datasets in pattern recognition literature. This dataset contains 3 classes of 50 instances each, where each class refers to a type of iris plant. One class is linearly separable from the other 2 the latter are NOT linearly separable from each other.