What is the unite() function in R

The unite() function is a useful tool in R Programming Language for data manipulation, particularly when working with data frames. It is part of the tidyr package, which provides a suite of functions designed to tidy data.

unite() function in R

The unite() function allows you to combine multiple columns into a single column, making it easier to manage and analyze your data. This article will explain the usage of unite(), its parameters, and provide practical examples to demonstrate its functionality.

The basic syntax of the unite() function is as follows:

Syntax:

unite(data, col, …, sep = “_”, remove = TRUE, na.rm = FALSE)

  • data: The data frame containing the columns you want to unite.
  • col: The name of the new column to be created.
  • sep: A string to separate the values in the new column (default is “_”).
  • remove: A logical value indicating whether to remove the input columns (default is TRUE).
  • na.rm: A logical value indicating whether to remove missing values (default is FALSE).

Basic Usage of unite()

Consider a data frame with separate columns for first and last names. We want to combine these into a single column called full_name.

R
install.packages("tidyr")
library(tidyr)

# Sample data frame
df <- data.frame(
  first_name = c("John", "Jane", "Doe"),
  last_name = c("Doe", "Smith", "Johnson")
)
df
# Using unite() to combine first and last names
df_united <- unite(df, col = "full_name", first_name, last_name, sep = " ")
print(df_united)

Output:

  first_name last_name
1       John       Doe
2       Jane     Smith
3        Doe   Johnson

    full_name
1    John Doe
2  Jane Smith
3 Doe Johnson

In this example, the first_name and last_name columns are combined into a single full_name column, with a space as the separator.

Changing the Separator using unite

You can change the separator to any string you prefer. Here, we use a comma:

R
# Using unite() with a different separator
df_united_comma <- unite(df, col = "full_name", first_name, last_name, sep = ", ")
print(df_united_comma)

Output:

     full_name
1    John, Doe
2  Jane, Smith
3 Doe, Johnson

Handling Missing Values using unite() function

By default, unite() includes missing values (NA) in the combined column. You can remove these using na.rm = TRUE.

R
# Sample data frame with missing values
df_na <- data.frame(
  first_name = c("John", NA, "Doe"),
  last_name = c("Doe", "Smith", NA)
)
df_na 
# Using unite() and removing NA values
df_united_na <- unite(df_na, col = "full_name", first_name, last_name, 
                      sep = " ", na.rm = TRUE)
print(df_united_na)

Output:

  first_name last_name
1       John       Doe
2       <NA>     Smith
3        Doe      <NA>

  full_name
1  John Doe
2     Smith
3       Doe

Conclusion

The unite() function in R is a powerful tool for combining multiple columns into one, making data manipulation more straightforward. By adjusting parameters such as sep, remove, and na.rm, you can customize how columns are united to fit your specific needs. Whether you’re tidying data or preparing it for analysis, unite() simplifies the process of column combination.