Creating a Decision Tree
Before we can extract information from decision rules, we need to create a decision tree. For this example, we will use the iris dataset.
Install and Load Necessary Libraries
Ensure you have the rpart and rpart.plot packages installed and loaded.
install.packages("rpart")
install.packages("rpart.plot")
library(rpart)
library(rpart.plot)
Load the Dataset
Load the built-in iris dataset in R Programming Language.
data(iris)
Create a Decision Tree Model
Create a decision tree model using the rpart function.
set.seed(123) # Set seed for reproducibility
tree_model <- rpart(Species ~ ., data = iris, method = "class")
Plot the Decision Tree
Visualize the decision tree using the rpart.plot function.
rpart.plot(tree_model)
Output:
Extracting Information from Decision Rules
To understand and extract the decision rules from the tree model, we can use various functions and methods.
Print the Detailed Summary of the Tree
The printcp function provides a detailed summary of the decision tree, including the complexity parameter and error rates.
printcp(tree_model)
Output:
Classification tree:
rpart(formula = Species ~ ., data = iris, method = "class")
Variables actually used in tree construction:
[1] Petal.Length Petal.Width
Root node error: 100/150 = 0.66667
n= 150
CP nsplit rel error xerror xstd
1 0.50 0 1.00 1.20 0.048990
2 0.44 1 0.50 0.76 0.061232
3 0.01 2 0.06 0.07 0.025833
Extract the Rules
The rpart package allows you to extract decision rules using the path.rpart function or by directly parsing the model.
# Extract rules from the tree model
rules <- path.rpart(tree_model, node = 1:tree_model$frame$n)
# Print the extracted rules
for (i in 1:length(rules)) {
cat(paste("Rule for Node", i, ":\n"))
cat(paste(rules[[i]], collapse = "\n"), "\n\n")
}
Output:
Rule for Node 1 :
root
Rule for Node 2 :
root
Petal.Length< 2.45
Rule for Node 3 :
root
Petal.Length>=2.45
Rule for Node 4 :
root
Petal.Length>=2.45
Petal.Width< 1.75
Rule for Node 5 :
root
Petal.Length>=2.45
Petal.Width>=1.75
Detailed Node Information
You can also extract detailed information about each node, including the split condition, number of observations, and predicted class.
# Extract detailed node information
tree_details <- as.data.frame(tree_model$frame)
# Display node details
print(tree_details)
Output:
var n wt dev yval complexity ncompete nsurrogate yval2.V1
1 Petal.Length 150 150 100 1 0.50 3 3 1.00000000
2 <leaf> 50 50 0 1 0.01 0 0 1.00000000
3 Petal.Width 100 100 50 2 0.44 3 3 2.00000000
6 <leaf> 54 54 5 2 0.00 0 0 2.00000000
7 <leaf> 46 46 1 3 0.01 0 0 3.00000000
yval2.V2 yval2.V3 yval2.V4 yval2.V5 yval2.V6 yval2.V7
1 50.00000000 50.00000000 50.00000000 0.33333333 0.33333333 0.33333333
2 50.00000000 0.00000000 0.00000000 1.00000000 0.00000000 0.00000000
3 0.00000000 50.00000000 50.00000000 0.00000000 0.50000000 0.50000000
6 0.00000000 49.00000000 5.00000000 0.00000000 0.90740741 0.09259259
7 0.00000000 1.00000000 45.00000000 0.00000000 0.02173913 0.97826087
yval2.nodeprob
1 1.00000000
2 0.33333333
3 0.66666667
6 0.36000000
7 0.30666667
tree_model$frame contains detailed information about each node in the decision tree, including variables used for splitting, number of observations, and more.
Visualize Important Splits
Plotting the variable importance can help you understand which variables are most influential in the decision-making process.
# Extract and plot variable importance
importance <- tree_model$variable.importance
barplot(importance, main = "Variable Importance", col = "lightblue", las = 2)
Output:
Convert the Tree to Rules
The rattle package can convert the decision tree into readable rules.
install.packages("rattle")
library(rattle)
# Convert the decision tree to rules
asRules(tree_model)
Output:
Rule number: 2 [Species=setosa cover=50 (33%) prob=1.00]
Petal.Length< 2.45
Rule number: 7 [Species=virginica cover=46 (31%) prob=0.00]
Petal.Length>=2.45
Petal.Width>=1.75
Rule number: 6 [Species=versicolor cover=54 (36%) prob=0.00]
Petal.Length>=2.45
Petal.Width< 1.75
The rattle package simplifies the decision tree into readable rules, facilitating easier interpretation.
How to Extract Information from the Decision Rules in rpart Package?
The rpart package in R is widely used for creating decision tree models. Decision trees are valuable because they provide a clear and interpretable set of rules for making predictions. Extracting and understanding these rules can offer insights into how the model makes decisions and which features are most important. This article will guide you through extracting information from the decision rules created by the rpart package in R.