How to use starts_with(), ends_with() In R Language

starts_with() returns a logical vector indicating which columns’ names start with a particular string. 

R




# Example dataset
df <- tibble(a_col = 1:3,
             b_col = 4:6, c_col = 7:9)
  
# Select columns that start with 'a'
df_starts_with_a <- df %>%
 select(starts_with('a'))
  
print(df_starts_with_a)


Output:

# A tibble: 3 Ă— 1
 a_col
 <int>
1     1
2     2
3     3

ends_with() returns a logical vector indicating which column names end with a particular string.

R




# Example dataset
df <- tibble(a_col_1 = 1:3,
             b_col_0 = 4:6, c_col_1 = 7:9)
  
# Select columns that end with '_1'
df_ends_with_1 <- df %>%
 select(ends_with('_1'))
  
print(df_ends_with_1)


Output:

# A tibble: 3 Ă— 2
 a_col_1 c_col_1
   <int>   <int>
1       1       7
2       2       8
3       3       9

Apply a Function (or functions) across Multiple Columns using dplyr in R

Data processing and manipulation are one of the core tasks in data science and machine learning. R Programming Language is one of the widely used programming languages for data science, and dplyr package is one of the most popular packages in R for data manipulation. In this article, we will learn how to apply a function (or functions) across multiple columns in R using the dplyr package.

What is dplyr?

dplyr is a powerful and efficient data manipulation package in R. It provides a set of functions for filtering, grouping, and transforming data. The functions in dplyr are designed to be simple and intuitive, making it easy to perform complex data manipulations with a few lines of code.

Prerequisites

Before we start, make sure that you have dplyr package installed in your system. If not, install it by running the following code:

install.packages("dplyr")

Once you have dplyr installed, you can load it into your R environment by running the following code:

library(dplyr)

Applying a Function to a Single Column

Let’s start by applying a function to a single column. For this, we will use the built-in mtcars data set. You can load this data set by running the following code. The mtcars data set contains information about various car models, including their miles per gallon (mpg) ratings. Let’s say we want to calculate the logarithm of the mpg column. We can do this using the mutate function from the dplyr package.

R




data("mtcars")
  
mtcars_log_mpg <- mtcars %>% 
 mutate(log_mpg = log(mpg))


The mutate function takes the data frame mtcars as input and adds a new column log_mpg with the logarithm of the mpg column. The %>% operator is the pipe operator, which passes the output of the previous operation as the first argument to the next operation.

Let’s visualize the changes brought by this transformation using a bar plot:

R




par(mfrow=c(1,2))
barplot(mtcars$mpg, main="Original mpg")
barplot(mtcars_log_mpg$log_mpg, main="log(mpg)")


OUTPUT:

\

This bar plot shows the original mpg column and its logarithm side by side, which helps us understand the changes brought by the logarithm function.

As we can see, the logarithm function reduces the range of values, which can be useful in some cases where the original values have a large range. In this case, the logarithm function brings the values of the mpg column closer to each other, which can make it easier to see patterns and relationships in the data.

Applying a Function to Multiple Columns

In the previous section, we learned how to apply a function to a single column. But what if we want to apply the same function to multiple columns in a data frame? For this, we can use the mutate_all function from the dplyr package. The mutate_all function takes a data frame as input and applies a function to all columns.Let’s say we have a data frame df with three columns, and we want to apply the logarithm function to all columns.

The mutate_all function applies the logarithm function to all columns in the data frame and returns a new data frame with the same number of columns, but with the logarithm of each column. To visually represent the changes brought by applying the logarithm function to all columns, we can plot the original data and the transformed data side by side:

R




df <- data.frame(col1 = runif(10),
                 col2 = runif(10),
                 col3 = runif(10))
df_log <- df %>% mutate_all(~ log(.))
  
par(mfrow=c(3,2))
for (i in 1:ncol(df)) {
  barplot(df[,i], main=colnames(df)[i])
  barplot(df_log[,i],
          main=paste("log(", colnames(df)[i], ")"))
}


Output:

Barplot for the data after applying log transformations

In this example, the original data is plotted in the first column of each row, and the transformed data is plotted in the second column of each row. The plots show how the logarithm function changes the distribution of each column.

Applying Different Functions to Different Columns

Sometimes, we may want to apply different functions to different columns. For this, we can use the mutate_at function from the dplyr package. The mutate_at function takes two arguments: the first is a vector of column names or indices, and the second is a formula that specifies the function to be applied.

Let’s say we want to apply the logarithm function to the first and third columns, and the square root function to the second column. Here, the mutate_at function is used twice, once for applying the logarithm function to columns 1 and 3, and once for applying the square root function to column 2. We can plot each column of the original data frame, its logarithm, and its square root:

R




df_log_sqrt <- df %>% 
 mutate_at(c(1, 3), ~ log(.)) %>% 
 mutate_at(2, ~ sqrt(.))
  
par(mfrow=c(3,3))
for (i in 1:ncol(df)) {
barplot(df[,i], main=colnames(df)[i])
barplot(df_log_sqrt[,i],
        main=ifelse(i %in% c(1,3),
                    paste("log(", colnames(df)[i], ")"),
                    paste("sqrt(", colnames(df)[i], ")")))
}


OUTPUT:

Barplot for the data after applying s square root transformations

As we can see, the first and third columns are transformed by the logarithm function, while the second column is transformed by the square root function.

Similar Reads

Using everything() and across() function

...

Using c_across() function

...

Using starts_with(), ends_with()

...

Using if_any() and if_all()

...