Select variables (columns) in R using Dplyr
In this article, we are going to select variables or columns in R programming language using dplyr library.
Dataset in use:
Select column with column name
Here we will use select() method to select column by its name
Syntax:
select(dataframe,column1,column2,.,column n)
Here, data frame is the input dataframe and columns are the columns in the dataframe to be displayed
Example 1: R program to select columns
R
# load the library library (dplyr) # create dataframe with 3 columns # id,name and address data1= data.frame (id= c (1,2,3,4,5,6,7,1,4,2), name= c ( 'sravan' , 'ojaswi' , 'bobby' , 'gnanesh' , 'rohith' , 'pinkey' , 'dhanush' , 'sravan' , 'gnanesh' , 'ojaswi' ), address= c ( 'hyd' , 'hyd' , 'ponnur' , 'tenali' , 'vijayawada' , 'vijayawada' , 'guntur' , 'hyd' , 'tenali' , 'hyd' )) # select id column from the dataframe by # column name print ( select (data1,id)) # select name column from the dataframe by # column name print ( select (data1,name)) |
Output:
Example 2: R program to select multiple columns
R
# load the library library (dplyr) # create dataframe with 3 columns # id,name and address data1= data.frame (id= c (1,2,3,4,5,6,7,1,4,2), name= c ( 'sravan' , 'ojaswi' , 'bobby' , 'gnanesh' , 'rohith' , 'pinkey' , 'dhanush' , 'sravan' , 'gnanesh' , 'ojaswi' ), address= c ( 'hyd' , 'hyd' , 'ponnur' , 'tenali' , 'vijayawada' , 'vijayawada' , 'guntur' , 'hyd' , 'tenali' , 'hyd' )) # select multiple columns from the dataframe # by column name print ( select (data1,id,name,address)) |
Output:
Select column(s) by position
We can also use the column position and get the column using select() method. Position starts with 1.
Syntax:
select(dataframe,column1_position,column2_position,.,column n_position)
where, dataframe is the input dataframe and column position is an column number
For selecting multiple columns we can use range operator “;” to select columns by their position
Syntax:
select(dataframe,start_position:end_position)
where, dataframe is the input dataframe, start_position is a column number starting position and end_position is a column number ending position
Example 1: R program to select particular column by column position
R
# load the library library (dplyr) # create dataframe with 3 columns # id,name and address data1= data.frame (id= c (1,2,3,4,5,6,7,1,4,2), name= c ( 'sravan' , 'ojaswi' , 'bobby' , 'gnanesh' , 'rohith' , 'pinkey' , 'dhanush' , 'sravan' , 'gnanesh' , 'ojaswi' ), address= c ( 'hyd' , 'hyd' , 'ponnur' , 'tenali' , 'vijayawada' , 'vijayawada' , 'guntur' , 'hyd' , 'tenali' , 'hyd' )) # select first column by column position print ( select (data1,1)) # select third column by column position print ( select (data1,3)) |
Output:
Example 2: R program to select multiple columns by positions
R
# load the library library (dplyr) # create dataframe with 3 columns # id,name and address data1= data.frame (id= c (1,2,3,4,5,6,7,1,4,2), name= c ( 'sravan' , 'ojaswi' , 'bobby' , 'gnanesh' , 'rohith' , 'pinkey' , 'dhanush' , 'sravan' , 'gnanesh' , 'ojaswi' ), address= c ( 'hyd' , 'hyd' , 'ponnur' , 'tenali' , 'vijayawada' , 'vijayawada' , 'guntur' , 'hyd' , 'tenali' , 'hyd' )) # select multiple column by column position print ( select (data1,1,2)) |
Output:
Example 3: R program to select multiple columns by position with range operator
R
# load the library library (dplyr) # create dataframe with 3 columns # id,name and address data1= data.frame (id= c (1,2,3,4,5,6,7,1,4,2), name= c ( 'sravan' , 'ojaswi' , 'bobby' , 'gnanesh' , 'rohith' , 'pinkey' , 'dhanush' , 'sravan' , 'gnanesh' , 'ojaswi' ), address= c ( 'hyd' , 'hyd' , 'ponnur' , 'tenali' , 'vijayawada' , 'vijayawada' , 'guntur' , 'hyd' , 'tenali' , 'hyd' )) # select multiple column by column # position with : operator print ( select (data1,1:3)) |
Output:
Select column which contains a value or matches a pattern
Here, we will display the column values based on values or pattern present in the column
Method 1: Using contains()
Display the column that contains the given sub string
Syntax:
select(dataframe,contains(‘sub_string’))
Here, dataframe is the input dataframe and sub_string is the string present in the column name
Example: R program to select column based on substring
R
# load the library library (dplyr) # create dataframe with 3 columns # id,name and address data1= data.frame (id= c (1,2,3,4,5,6,7,1,4,2), name= c ( 'sravan' , 'ojaswi' , 'bobby' , 'gnanesh' , 'rohith' , 'pinkey' , 'dhanush' , 'sravan' , 'gnanesh' , 'ojaswi' ), address= c ( 'hyd' , 'hyd' , 'ponnur' , 'tenali' , 'vijayawada' , 'vijayawada' , 'guntur' , 'hyd' , 'tenali' , 'hyd' )) # select column that contains am print ( select (data1, contains ( 'am' ))) # select column that contains d print ( select (data1, contains ( 'd' ))) # select column that contains dd print ( select (data1, contains ( 'dd' ))) |
Output:
Method 2: Using matches()
It will check and display the column that contains the given sub string
select(dataframe,matches(‘sub_string’))
Here, dataframe is the input dataframe and sub_string is the string present in the column name
Example: R program to select column based on substring
R
# load the library library (dplyr) # create dataframe with 3 columns # id,name and address data1= data.frame (id= c (1,2,3,4,5,6,7,1,4,2), name= c ( 'sravan' , 'ojaswi' , 'bobby' , 'gnanesh' , 'rohith' , 'pinkey' , 'dhanush' , 'sravan' , 'gnanesh' , 'ojaswi' ), address= c ( 'hyd' , 'hyd' , 'ponnur' , 'tenali' , 'vijayawada' , 'vijayawada' , 'guntur' , 'hyd' , 'tenali' , 'hyd' )) # select column that matches with am print ( select (data1, matches ( 'am' ))) # select column that matches with d print ( select (data1, matches ( 'd' ))) # select column that matches with dd print ( select (data1, matches ( 'dd' ))) |
Output:
Select column which starts with or ends with certain character
Here we can also select columns based on starting and ending characters.
- starts_with() is used to return the column that starts with the given character.
Syntax:
select(dataframe,starts_with(‘substring’))
Where, dataframe is the input dataframe and substring is the character/string that starts with it
- ends_with() is used to return the column that ends with the given character.
Syntax:
select(dataframe,ends_with(‘substring’))
where, dataframe is the input dataframe and substring is the character/string that ends with it
Example 1: R program to display columns that starts with a character/substring
R
# load the library library (dplyr) # create dataframe with 3 columns id,name and address data1= data.frame (id= c (1,2,3,4,5,6,7,1,4,2), name= c ( 'sravan' , 'ojaswi' , 'bobby' , 'gnanesh' , 'rohith' , 'pinkey' , 'dhanush' , 'sravan' , 'gnanesh' , 'ojaswi' ), address= c ( 'hyd' , 'hyd' , 'ponnur' , 'tenali' , 'vijayawada' , 'vijayawada' , 'guntur' , 'hyd' , 'tenali' , 'hyd' )) # select column that starts with n print ( select (data1, starts_with ( 'n' ))) # select column that starts with add print ( select (data1, starts_with ( 'add' ))) |
Output:
Example 2: R program to select column that ends with a given string or character
R
# load the library library (dplyr) # create dataframe with 3 columns id,name and address data1= data.frame (id= c (1,2,3,4,5,6,7,1,4,2), name= c ( 'sravan' , 'ojaswi' , 'bobby' , 'gnanesh' , 'rohith' , 'pinkey' , 'dhanush' , 'sravan' , 'gnanesh' , 'ojaswi' ), address= c ( 'hyd' , 'hyd' , 'ponnur' , 'tenali' , 'vijayawada' , 'vijayawada' , 'guntur' , 'hyd' , 'tenali' , 'hyd' )) # select column that ends with ss print ( select (data1, ends_with ( 'ss' ))) # select column that ends with d print ( select (data1, ends_with ( 'd' ))) |
Output:
Select all columns
We can select all the columns in the data frame by using everything() method.
Syntax:
select(dataframe,everything())
Example: R program to select all columns
R
# load the library library (dplyr) # create dataframe with 3 columns # id,name and address data1= data.frame (id= c (1,2,3,4,5,6,7,1,4,2), name= c ( 'sravan' , 'ojaswi' , 'bobby' , 'gnanesh' , 'rohith' , 'pinkey' , 'dhanush' , 'sravan' , 'gnanesh' , 'ojaswi' ), address= c ( 'hyd' , 'hyd' , 'ponnur' , 'tenali' , 'vijayawada' , 'vijayawada' , 'guntur' , 'hyd' , 'tenali' , 'hyd' )) # select all columns using everything method print ( select (data1, everything ())) |
Output: