psych Package in R
The “psych” package is an R package that provides various functions for psychological research and data analysis. It contains tools for data visualization, factor analysis, reliability analysis, correlation analysis, and more. In this article, we will discuss the basics of using the “psych” package in R Programming Language.
Introduction to Psych Package
Before we proceed to the steps, it is important to understand some key concepts related to the “psych” package:
- Factor Analysis: This is a statistical technique used to identify underlying factors or dimensions in a set of observed variables. The “psych” package provides various functions for performing factor analysis, including “principal()”, “fa()”, and “fa.parallel()”.
- Reliability Analysis: This involves testing the consistency and stability of a set of measurements or items. The “psych” package provides functions for calculating various types of reliability coefficients, including Cronbach’s alpha, Guttman’s lambda, and more.
- Principal Component Analysis: This is another statistical technique used to identify underlying dimensions in a set of variables. The psych package includes functions for performing PCA and visualizing the results.
- Cluster Analysis: This is a technique used to group objects or individuals based on their similarities or differences. The psych package includes functions for hierarchical clustering, k-means clustering, and other types of cluster analysis.
- Correlation Analysis: This involves examining the relationship between two or more variables. The “psych” package provides functions for calculating various types of correlation coefficients, including Pearson’s r, Spearman’s rho, and Kendall’s tau.
Some Common Functions from Psych Package
Function |
Description |
---|---|
describe() |
Provides descriptive statistics for a dataset. |
cor() |
Computes correlation coefficients between variables. |
fa() |
Performs factor analysis on a dataset. |
alpha() |
Calculates Cronbach’s alpha to measure internal consistency. |
principal() |
Conducts principal component analysis (PCA). |
iclust() |
Performs hierarchical cluster analysis. |
tetrachoric() |
Estimates tetrachoric correlations. |
omega() |
Computes McDonald’s omega hierarchical factor structure. |
psych::pairwise.panels() |
Generates pairwise scatterplots with correlations. |
multifactor() |
Computes the optimal number of factors for factor analysis. |
lpc() |
Estimates the linear principal components for a correlation or covariance matrix, providing a rotated solution. |
cortest.bartlett() |
Tests the hypothesis of sphericity for a correlation matrix. |
sim.hierarchical() |
Simulates data from a hierarchical factor model. |
irt.fa() |
Estimates item response theory (IRT) models for factor analysis. |
principal.r() |
Performs a principal component analysis (PCA) on a correlation or covariance matrix with parallel analysis. |
fa.parallel() |
Conducts parallel analysis to determine the number of factors to retain in exploratory factor analysis (EFA). |
omega.smc() |
Calculates the McDonald’s omega reliability coefficient (hierarchical omega) for a factor structure. |
psychTestScores() |
Computes test scores for a factor model using regression weights and factor scores obtained from factor analysis. |
Descriptive Statistics using Psych Package
Descriptive statistics about the data helps us get a feel of the data and its distribution. We have describe() function in the psych package which can help us get the descriptive statistical measures of the dataset at hand.
R
# Load the psych package library (psych) # Create a vector of numeric values data <- c (3, 5, 2, 7, 6, 4) # Calculate descriptive statistics desc_stats <- describe (data) # Print the descriptive statistics print (desc_stats) |
Output:
vars n mean sd median trimmed mad min max range skew kurtosis se X1 1 6 4.5 1.87 4.5 4.5 2.22 2 7 5 0 -1.8 0.76
Correlation Test using Psych Package
Correlation test is used to measure the relationship between the two variables at hand. Also this helps us identify how two independent features of a machine learning dataset are related with each other.
R
# Create a data frame with two variables data <- data.frame ( var1 = c (3, 5, 2, 7, 6, 4), var2 = c (1, 4, 3, 6, 5, 2) ) # Perform a correlation test corr_result <- corr.test (data) # Print the correlation test results print (corr_result) |
Output:
# Create a data frame with two variables data <- data.frame( var1 = c(3, 5, 2, 7, 6, 4), var2 = c(1, 4, 3, 6, 5, 2) ) # Perform a correlation test corr_result <- corr.test(data) # Print the correlation test results print(corr_result)
Cronbach’s Alpha using Psych Package
This measure helps ensure that the selected variables are in a coherence with each other. This parameter helps enhancing the overall accuracy and reliability of the model’s prediction.
R
# Create a data frame with three variables data <- data.frame ( var1 = c (3, 5, 2, 7, 6, 4), var2 = c (1, 4, 3, 6, 5, 2), var3 = c (2, 6, 4, 3, 5, 1) ) # Calculate Cronbach's alpha alpha_result <- alpha (data) # Print Cronbach's alpha print (alpha_result$alpha) |
Output:
raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r var1 0.7037037 0.7037037 0.5428571 0.5428571 2.3750000 0.24192491 NA var2 0.4090909 0.4090909 0.2571429 0.2571429 0.6923077 0.48247525 NA var3 0.9062500 0.9062500 0.8285714 0.8285714 9.6666667 0.07654655 NA med.r var1 0.5428571 var2 0.2571429 var3 0.8285714
Factor Analysis using Psych Package
Factor Analysis helps us analyze the relationship between different latent variables of the dataset. Also sometimes it helps us reduce the dimensionality of the the data. Ultimately, factor analysis enhances machine learning models by improving interpretability, reducing noise, and enhancing predictive accuracy.
R
# Create a data frame with four variables data <- data.frame ( var1 = c (1, 2, 3, 4, 5), var2 = c (2, 3, 4, 5, 6), var3 = c (3, 4, 5, 6, 7), var4 = c (4, 5, 6, 7, 8) ) # Perform factor analysis fa_result <- fa (data, nfactors = 2) # Print factor analysis results print (fa_result) |
Output:
Factor Analysis using method = minres Call: fa(r = data, nfactors = 2) Standardized loadings (pattern matrix) based upon correlation matrix item MR1 MR2 h2 u2 com var1 1 0.88 0.14 0.86 0.14 1 var2 2 0.90 -0.04 0.81 0.19 1 var3 3 0.99 0.02 0.99 0.01 1 var4 4 0.99 0.03 0.99 0.01 1 Mean item complexity = 1 Test of the hypothesis that 2 factors are sufficient. The degrees of freedom for the null model are 6 and the objective function was 0.44 with Chi Square of 13.78 The degrees of freedom for the model are 2 and the objective function was 0.04 The root mean square of the residuals (RMSR) is 0.06 The df corrected root mean square of the residuals is 0.09 Fit based upon off diagonal values = 0.99 TLI index = 0.99 CFI index = 1 RMSEA index = 0.08 Rsquare for each item = var1 var2 var3 var4 0.7740065 0.6545365 0.9999988 0.9999993 Factor corrs = MR1 MR2 MR1 1.00 0.41 MR2 0.41 1.00