Implementing Hierarchical Linear Regression in R

Understanding Hierarchical Linear Regression

In this article we will use different real-world base examples to understand our topic in detail using different packages for analysis and visualization. We can perform this analysis in 5 different steps:

Step 1: Creating/Loading a Dataset
Step 2: Loading packages
Step 3: Model Fitting
Step 4: Running an Analysis of Variance (ANOVA) and Assessing Model Fit
Step 5: Making a prediction
Step 6: Interpretation and Visualization

We will try to understand the relationship between teaching method and student performance or marks obtained by the student. We will create a fictional dataset using a set.seed() function which helps in generating random data.

Hierarchical linear regression using mtcars dataset in R

In this example we will take an in-built dataset in R programming language called “mtcars”. This dataset have information such as weight, horsepower, and quarter mile time, miles per gallon (mpg) about different types of cars. We will try to understand how these attributes affect each other at different levels.

Step 1: Loading the Dataset

R

# Loading the Dataset
data(mtcars)

We can also view our data by using different functions such as head(), tail() or summary(). The head() function returns the first 6 rows of our dataset similarly, tail() returns the last 6 rows.

Step 2: Model Specification

In this step, we’ll specify the hierarchical linear regression model. We’ll consider two levels of hierarchy: car features and car make. We will try to understand the influence of car make or brand on mpg.

R

# Car features level
model <- lm(mpg ~ wt + hp + qsec, data = mtcars) 
# Car make level
model_by_make <- lm(mpg ~ disp + drat + wt, data = mtcars)  
# summary of the model
 
summary(model)
summary(model_by_make)

Output:

Call:
lm(formula = mpg ~ wt + hp + qsec, data = mtcars)
Residuals:
    Min      1Q  Median      3Q     Max 
-3.8591 -1.6418 -0.4636  1.1940  5.6092 
Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 27.61053    8.41993   3.279  0.00278 ** 
wt          -4.35880    0.75270  -5.791 3.22e-06 ***
hp          -0.01782    0.01498  -1.190  0.24418    
qsec         0.51083    0.43922   1.163  0.25463    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.578 on 28 degrees of freedom
Multiple R-squared:  0.8348,    Adjusted R-squared:  0.8171 
F-statistic: 47.15 on 3 and 28 DF,  p-value: 4.506e-11
 summary(model_by_make)
Call:
lm(formula = mpg ~ disp + drat + wt, data = mtcars)
Residuals:
    Min      1Q  Median      3Q     Max 
-3.2342 -2.3719 -0.3148  1.6315  6.2820 
Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 31.043257   7.099792   4.372 0.000154 ***
disp        -0.016389   0.009578  -1.711 0.098127 .  
drat         0.843965   1.455051   0.580 0.566537    
wt          -3.172482   1.217157  -2.606 0.014495 *  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.951 on 28 degrees of freedom
Multiple R-squared:  0.7835,    Adjusted R-squared:  0.7603 
F-statistic: 33.78 on 3 and 28 DF,  p-value: 1.92e-09

In above model specification : one model “model” examines how car features (such as weight, horsepower, and quarter mile time) influence the miles per gallon (mpg) performance of the cars. The other model, “model_by_make”, focuses on understanding how certain features (such as displacement, rear axle ratio, and weight) affect mpg within different car makes based on the dataset we have. This is a hierarchical structure of data.

Step 3: Running an Analysis of Variance (ANOVA) and Assessing Model Fit

As discussed in the previous example, this step is important for understanding the significance of the model fit.

R

# Conducting Analysis of Variance (ANOVA) for both models
anova_model <- anova(model)
anova_model_by_make <- anova(model_by_make)
 
# Printing ANOVA results for both models
cat("ANOVA for model:\n")
print(anova_model)
cat("\nANOVA for model by make:\n")
print(anova_model_by_make)

Output:


ANOVA for model:
Analysis of Variance Table
Response: mpg
          Df Sum Sq Mean Sq  F value    Pr(>F)    
wt         1 847.73  847.73 127.5739 6.131e-12 ***
hp         1  83.27   83.27  12.5319   0.00142 ** 
qsec       1   8.99    8.99   1.3527   0.25463    
Residuals 28 186.06    6.64                       
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
ANOVA for model by make:
Analysis of Variance Table
Response: mpg
          Df Sum Sq Mean Sq F value    Pr(>F)    
disp       1 808.89  808.89 92.9171 2.152e-10 ***
drat       1  14.26   14.26  1.6383    0.2111    
wt         1  59.14   59.14  6.7937    0.0145 *  
Residuals 28 243.75    8.71                      
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

The incremental R-squared of 0.05123624 suggests that the additional predictors in the “model_by_make” account for an additional 5.12% of the variance.
Overall, the results indicate that the additional predictors in the “model_by_make” significantly contribute to explaining the variance in the response variable compared to the initial “model”.

Assessing Sums of Squares for both models

R

# Assessing Sums of Squares for both models
SST <- anova_model$`Sum Sq`[1]
SSR <- anova_model$`Sum Sq`[2]
SSE <- anova_model$`Sum Sq`[3]
 
SST_model_by_make <- anova_model_by_make$`Sum Sq`[1]
SSR_model_by_make <- anova_model_by_make$`Sum Sq`[2]
SSE_model_by_make <- anova_model_by_make$`Sum Sq`[3]
 
cat("\nSums of Squares:\n")
cat("SST:", SST, "\n")
cat("SSR:", SSR, "\n")
cat("SSE:", SSE, "\n")

Output:

Sums of Squares:
SST: 847.7252 
SSR: 83.27418 
SSE: 8.988458

Calculating Difference in Sum of Squares for both models

R

# Calculating Difference in Sum of Squares for both models
SSR_diff <- SST - SSE
SSR_diff_model_by_make <- SST_model_by_make - SSE_model_by_make
 
# Deriving F-Statistics and P-Values for both models
F_statistic <- anova_model_by_make$`F value`[1]
p_value <- anova_model_by_make$`Pr(>F)`[1]
 
F_statistic_model_by_make <- anova_model_by_make$`F value`[1]
p_value_model_by_make <- anova_model_by_make$`Pr(>F)`[1]
 
cat("\nDifference in Sum of Squares for model by make (SSR_diff_model_by_make):", 
    SSR_diff_model_by_make, "\n")
cat("\nF-Statistic for model by make:", F_statistic_model_by_make, "\n")
cat("P-Value for model by make:", p_value_model_by_make, "\n")

Output:

Difference in Sum of Squares for model by make (SSR_diff_model_by_make): 749.7461 
F-Statistic for model by make: 92.91705 
P-Value for model by make: 2.151799e-10

Computing Incremental R-Squared

R

# Computing Incremental R-Squared
incremental_R_squared <- summary(model)$r.squared - summary(model_by_make)$r.squared
 
cat("\nIncremental R-Squared:", incremental_R_squared, "\n")

Output:

Incremental R-Squared: 0.05123624

4. Making Predictions

We’ll predict the mpg for a specific car make, in this case, “Ford Pinto.” Here we will assign certain values for displacement of car, rear axle ratio for a car and weight of the car in thousands of pounds.

R

new_data <- data.frame(disp = 120, drat = 3.9, wt = 2.8)  
predicted_mpg <- predict(model_by_make, newdata = new_data)
cat("Predicted MPG:", predicted_mpg, "\n")

Output:

Predicted MPG: 23.48507

23.48507 represents the estimated miles per gallon (mpg) performance based on our predictions.

Interpretation and Visualization

R

library(ggplot2)
 
# Adding a new variable 'make' derived from row names of mtcars dataset
mtcars$make <- rownames(mtcars) 
 #'make' on the x-axis and 'mpg' on the y-axis
ggplot(mtcars, aes(x = make, y = mpg)) +
  geom_bar(stat = "identity", fill = "Red") + 
  labs(title = "MPG by Car Make", x = "Car Make", y = "MPG") + 
 # Adjust the x-axis text angle for better readability
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Output:

Hierarchical linear regression using R

The barplot shows mpg(miles per gallon) of various car make(brand). This graph provides a clear visualization of the mpg performance across various car makes, allowing for easy comparison.

In this example, we used in-built dataset in R called “mtcars” and used it to perform hierarchical regression. Here, we specified two model for different of our dataset. We predicted certain values for mpg using our dataset. Prediction understands the previous patterns and trends in our dataset and predicted values using predict() function. We also plotted a graph based on car make(brand) and mpg values for them for better understanding.

Conclusion:

This article was focused on different examples based on the real-life. With the help of these examples we understood how hierarchical linear regression works and how it uses multilevel dataset to understand the relationship between the dependent and independent variables and how they influence the prediction. We also understood the role of underlying patterns and trends of historical dataset in prediction.

Hierarchical linear regression using R

Linear Regression model is used to establish a connection between two or more variables. These variables are either dependent or independent. Linear Regression In R Programming Language is used to give predictions based on the given data about a particular topic, It helps us to have valuable insights and give conclusions that help us in many future decisions. Hierarchical linear regression is an extension of the standard linear regression that allows for the analysis of such hierarchical data(grouped data or data at different levels).

Tags:

#Geeks Premier League 2023 #AI-ML-DS #Geeks Premier League #R Machine Learning

Understanding Hierarchical Linear Regression

Implementing Hierarchical Linear Regression in R

Hierarchical linear regression using mtcars dataset in R

Step 1: Loading the Dataset

R

Step 2: Model Specification

R

Step 3: Running an Analysis of Variance (ANOVA) and Assessing Model Fit

R

Assessing Sums of Squares for both models

R

Calculating Difference in Sum of Squares for both models

R

Computing Incremental R-Squared

R

4. Making Predictions

R

Interpretation and Visualization

R

Conclusion:

Hierarchical linear regression using R

Similar Reads