How to Calculate Correlation in R with Missing Values

How to Calculate Spearman Rank Correlation in Excel?

How to Perform a Likelihood Ratio Test in R

When we calculate correlation in R Programming Language with missing values then its default behavior is to exclude observations with missing values pairwise, meaning that if a pair of variables has missing values for any observation, that pair will not contribute to the correlation calculation for those observations. In this article, we will learn about different approaches by which we can calculate correlation in R with missing values

How to Calculate Correlation in R with Missing Values

Below are some of the ways by which we can calculate correlation in R with missing values

Using the cor() with complete.obs
Using cor() with pairwise.complete.obs
Handling Missing Values Manually
Using the cov() and cor() Functions with Imputation

Calculate Correlation with Missing Values Using cor() with complete.obs

In this example, we use the cor() function to calculate the correlation coefficient between x and y. By specifying use = ‘complete.obs’,it calculate the correlation coefficient using only complete observations. The resulting correlation coefficient is then printed to the console.

R

# Sample dataset with missing values
data <- data.frame(
  A = c(1, 2, 3, NA, 5),
  B = c(5, NA, 7, 8, 9),
  C = c(10, 11, 12, 13, NA)
)
 
# Calculate correlation with missing values using cor() with complete.obs
correlation_matrix <- cor(data, use = "complete.obs")
 
# Print the correlation matrix
print(correlation_matrix)

Output:

Calculate Correlation with Missing Values Using cor() with pairwise.complete.obs

In this example, we use the cor() function again, by specifying use = 'pairwise.complete.obs', it calculates correlation matrix based on pairwise complete observations. The resulting correlation matrix is then printed to the console.

R

# Create sample data frame with missing values
df <- data.frame(
  x = c(1, 2, 3, NA, 5),
  y = c(4, NA, 6, 7, 8)
)
 
# Calculate correlation matrix
correlation_matrix <- cor(df, use = 'pairwise.complete.obs')
print(correlation_matrix)

Output:

  x y
x 1 1
y 1 1

Calculate Correlation with Missing Values by Handling Missing Values Manually

In this approach ,missing values are manually handled by removing rows with missing values before calculating the correlation matrix. It ensures that only complete data is used in the correlation calculation.

R

# Example data with missing values
data <- data.frame(
  x = c(1, 2, 3, NA, 5),
  y = c(3, NA, 4, 5, 6)
)
 
# Remove rows with missing values
complete_data <- na.omit(data)
 
# Calculate correlation matrix with complete data
correlation_matrix <- cor(complete_data)
# View the correlation matrix
correlation_matrix

Output:

          x         y
x 1.0000000 0.9819805
y 0.9819805 1.0000000

Calculate Correlation with Missing Values Using the `cov()` and `cor()` Functions with Imputation

In this method, we impute missing values with the mean of each column before calculating the correlation coefficients using all available data.

R

# Example data with missing values
data <- data.frame(
    x = c(1, 2, 3, NA, 5),
    y = c(3, NA, 4, 5, 6)
)
 
# Impute missing values with mean
imputed_data <- apply(data, 2, function(x) ifelse(is.na(x), mean(x, na.rm = TRUE), x))
 
# Calculate covariance matrix
covariance_matrix <- cov(imputed_data)
 
# Calculate correlation matrix
correlation_matrix <- cor(imputed_data)
# View the correlation matrix
correlation_matrix

Output:

          x         y
x 1.0000000 0.8882165
y 0.8882165 1.0000000

Conclusion

In this article we understood how to calculate correlation coefficients with missing values.We can effectively handle missing values and derive insights from incomplete datasets. These methods allow us to assess the relationship between variables while accounting for missing data, ensuring a more accurate and comprehensive analysis.

Tags:

#Dev Scripter 2024 #R Basics #Dev Scripter #R Language

How to Calculate Spearman Rank Correlation in Excel?

How to Perform a Likelihood Ratio Test in R

How to Calculate Correlation in R with Missing Values

How to Calculate Correlation in R with Missing Values

Calculate Correlation with Missing Values Using cor() with complete.obs

R

Calculate Correlation with Missing Values Using cor() with pairwise.complete.obs

R

Calculate Correlation with Missing Values by Handling Missing Values Manually

R

Calculate Correlation with Missing Values Using the cov() and cor() Functions with Imputation

R

Conclusion

Calculate Correlation with Missing Values Using the `cov()` and `cor()` Functions with Imputation