How to use c_across() function In R Language

c_across() is a function in the dplyr package in R that allows you to select columns in a tidy-select manner and apply the same function to them. It is commonly used in conjunction with rowwise() to apply functions row-wise to a data frame. c_across() takes a tidy-select object (a set of columns that you want to apply a function too) and returns a list of the output of applying a function to each column.

R




library(dplyr)
  
# create sample data frame
df <- tibble(id = 1:3, a = c(1, 2, 3),
             b = c(4, 5, 6), c = c(7, 8, 9))
  
# use rowwise() and c_across() 
# to get sum of selected columns
df %>% 
  rowwise() %>% 
  mutate(
    sum_cols = sum(c_across(c(a, c)))
  )


Output:

# A tibble: 3 × 5
# Rowwise: 
    id     a     b     c sum_cols
 <int> <dbl> <dbl> <dbl>    <dbl>
1     1     1     4     7        8
2     2     2     5     8       10
3     3     3     6     9       12

In the above example, c_across() is used to select columns ‘a’ and ‘c’, and rowwise() is used to perform row-wise operations on the selected columns. The mutate() function is used to create a new column named sum_cols, which contains the sum of values in columns ‘a’ and ‘c’.

Apply a Function (or functions) across Multiple Columns using dplyr in R

Data processing and manipulation are one of the core tasks in data science and machine learning. R Programming Language is one of the widely used programming languages for data science, and dplyr package is one of the most popular packages in R for data manipulation. In this article, we will learn how to apply a function (or functions) across multiple columns in R using the dplyr package.

What is dplyr?

dplyr is a powerful and efficient data manipulation package in R. It provides a set of functions for filtering, grouping, and transforming data. The functions in dplyr are designed to be simple and intuitive, making it easy to perform complex data manipulations with a few lines of code.

Prerequisites

Before we start, make sure that you have dplyr package installed in your system. If not, install it by running the following code:

install.packages("dplyr")

Once you have dplyr installed, you can load it into your R environment by running the following code:

library(dplyr)

Applying a Function to a Single Column

Let’s start by applying a function to a single column. For this, we will use the built-in mtcars data set. You can load this data set by running the following code. The mtcars data set contains information about various car models, including their miles per gallon (mpg) ratings. Let’s say we want to calculate the logarithm of the mpg column. We can do this using the mutate function from the dplyr package.

R




data("mtcars")
  
mtcars_log_mpg <- mtcars %>% 
 mutate(log_mpg = log(mpg))


The mutate function takes the data frame mtcars as input and adds a new column log_mpg with the logarithm of the mpg column. The %>% operator is the pipe operator, which passes the output of the previous operation as the first argument to the next operation.

Let’s visualize the changes brought by this transformation using a bar plot:

R




par(mfrow=c(1,2))
barplot(mtcars$mpg, main="Original mpg")
barplot(mtcars_log_mpg$log_mpg, main="log(mpg)")


OUTPUT:

\

This bar plot shows the original mpg column and its logarithm side by side, which helps us understand the changes brought by the logarithm function.

As we can see, the logarithm function reduces the range of values, which can be useful in some cases where the original values have a large range. In this case, the logarithm function brings the values of the mpg column closer to each other, which can make it easier to see patterns and relationships in the data.

Applying a Function to Multiple Columns

In the previous section, we learned how to apply a function to a single column. But what if we want to apply the same function to multiple columns in a data frame? For this, we can use the mutate_all function from the dplyr package. The mutate_all function takes a data frame as input and applies a function to all columns.Let’s say we have a data frame df with three columns, and we want to apply the logarithm function to all columns.

The mutate_all function applies the logarithm function to all columns in the data frame and returns a new data frame with the same number of columns, but with the logarithm of each column. To visually represent the changes brought by applying the logarithm function to all columns, we can plot the original data and the transformed data side by side:

R




df <- data.frame(col1 = runif(10),
                 col2 = runif(10),
                 col3 = runif(10))
df_log <- df %>% mutate_all(~ log(.))
  
par(mfrow=c(3,2))
for (i in 1:ncol(df)) {
  barplot(df[,i], main=colnames(df)[i])
  barplot(df_log[,i],
          main=paste("log(", colnames(df)[i], ")"))
}


Output:

Barplot for the data after applying log transformations

In this example, the original data is plotted in the first column of each row, and the transformed data is plotted in the second column of each row. The plots show how the logarithm function changes the distribution of each column.

Applying Different Functions to Different Columns

Sometimes, we may want to apply different functions to different columns. For this, we can use the mutate_at function from the dplyr package. The mutate_at function takes two arguments: the first is a vector of column names or indices, and the second is a formula that specifies the function to be applied.

Let’s say we want to apply the logarithm function to the first and third columns, and the square root function to the second column. Here, the mutate_at function is used twice, once for applying the logarithm function to columns 1 and 3, and once for applying the square root function to column 2. We can plot each column of the original data frame, its logarithm, and its square root:

R




df_log_sqrt <- df %>% 
 mutate_at(c(1, 3), ~ log(.)) %>% 
 mutate_at(2, ~ sqrt(.))
  
par(mfrow=c(3,3))
for (i in 1:ncol(df)) {
barplot(df[,i], main=colnames(df)[i])
barplot(df_log_sqrt[,i],
        main=ifelse(i %in% c(1,3),
                    paste("log(", colnames(df)[i], ")"),
                    paste("sqrt(", colnames(df)[i], ")")))
}


OUTPUT:

Barplot for the data after applying s square root transformations

As we can see, the first and third columns are transformed by the logarithm function, while the second column is transformed by the square root function.

Similar Reads

Using everything() and across() function

...

Using c_across() function

...

Using starts_with(), ends_with()

...

Using if_any() and if_all()

...