Impute the entire dataset

This can be done by imputing Median value of each column with NA using apply( ) function.

Syntax: 

apply(X, MARGIN, FUN, …)

Parameter:

  • X – an array, including a matrix
  • MARGIN – a vector
  • FUN – the function to be applied

Example: Impute the entire dataset 

R




# create a adataframe
data <- data.frame(marks1 = c(NA, 22, NA, 49, 75),
                   marks2 = c(81, 14, NA, 61, 12),
                   marks3 = c(78.5, 19.325, NA, 28, 
                              48.002))
  
# getting median of each column using apply() 
all_column_median <- apply(data, 2, median, na.rm=TRUE)
  
# imputing median value with NA 
for(i in colnames(data))
  data[,i][is.na(data[,i])] <- all_column_median[i]
  
data


Output:



How to Impute Missing Values in R?

In this article, we will discuss how to impute missing values in R programming language.

In most datasets, there might be missing values either because it wasn’t entered or due to some error. Replacing these missing values with another value is known as Data Imputation. There are several ways of imputation. Common ones include replacing with average, minimum, or maximum value in that column/feature. Different datasets and features will require one type of imputation method. For example, considering a dataset of sales performance of a company, if the feature loss has missing values then it would be more logical to replace a minimum value.

Dataset in use:

Similar Reads

Impute One Column

Method 1: Imputing manually with Mean value...

Impute the entire dataset:

...