Replace specific values in column using regex in R
In this article, we will discuss how to replace specific values in columns of dataframe in R Programming Language.
Method 1 : Using sub() method
The sub() method in R programming language is a replacement method used to replace any occurrence of a pattern matched with another string. It is operative on the dataframe column or vector. It is particularly useful in the case of large datasets. It can be used to replace a character or both strings composed of one or more words in the specified dataframe column.
Syntax:
sub (pattern , new_string , df$col-name)
Parameter :
- pattern â regular expression , or a character string to replace. A * in the pattern indicates one or more characters.
- new_string â the string to replace the matches with
- df$col-name â the desired column name
Example 1:
R
# declaring dataframe data_frame <- data.frame (col1 = c ( "Beginner" , "for" , "geek" , "friends" )) print ( "Original DataFrame" ) print (data_frame) data_frame$col1 <- sub ( "^ge.*" , "new_String" , data_frame$col1) print ( "Modified DataFrame" ) print (data_frame) |
Output
[1] "Original DataFrame" col1 1 Beginner 2 for 3 geek 4 friends [1] "Modified DataFrame" col1 1 new_String 2 for 3 new_String 4 friends
This method replaces only the first occurrence of the specified string from the mainline.
Example 2:
R
# declaring dataframe data_frame <- data.frame (col1 = c ( "Beginner for Beginner interviews" , "suitable 4 placements" , "interviews placements interviews" )) print ( "Original DataFrame" ) print (data_frame) data_frame$col1 <- sub ( "interviews" , "programming" , data_frame$col1) print ( "Modified DataFrame" ) print (data_frame) |
Output
[1] "Original DataFrame" col1 1 Beginner for Beginner interviews 2 suitable 4 placements 3 interviews placements interviews [1] "Modified DataFrame" col1 1 Beginner for Beginner programming 2 suitable 4 placements 3 programming placements interviews
Method 2 : Using gsub() method
The gsub( ) method is similar to the sub() method. However, it can use regular expressions for substitution. It also replaces all the occurrences of a particular word in the line.
Syntax:
gsub (pattern , new_string , df$col-name)
Parameter :
- pattern â regular expression , or a character string to replace
- new_string â the string to replace the matches with
- df$col-name â the desired column name
Example 1:
R
# declaring dataframe data_frame <- data.frame (col1 = c ( "Beginner" , "for" , "friends" , "gap" , "geek" )) print ( "Original DataFrame" ) print (data_frame) data_frame$col1 <- gsub ( "^\\ge.*" , "new_String" , data_frame$col1) print ( "Modified DataFrame" ) print (data_frame) |
Output
[1] "Original DataFrame" col1 1 Beginner 2 for 3 friends 4 gap 5 geek [1] "Modified DataFrame" col1 1 new_String 2 for 3 friends 4 gap 5 new_String
The gsub() method can be used to replace all the occurrences of a particular column.
Example 2:
R
# declaring dataframe data_frame <- data.frame (col1 = c ( "Beginner" , "for" , "friends" , "gap" , "geek" )) print ( "Original DataFrame" ) print (data_frame) data_frame$col1 <- gsub ( ".*^" , "GFG " ,data_frame$col1) print ( "Modified DataFrame" ) print (data_frame) |
Output:
[1] "Original DataFrame" col1 1 Beginner 2 for 3 friends 4 gap 5 geek [1] "Modified DataFrame" col1 1 GFG Beginner 2 GFG for 3 GFG friends 4 GFG gap 5 GFG geek
It can also be used to remove numbers from the string components of the values.
Example 3:
R
# declaring dataframe data_frame <- data.frame (col1 = c ( "Beginner12 is good" , "suitable 4 placements" , "love you 2 much" )) print ( "Original DataFrame" ) print (data_frame) data_frame$col1 <- gsub ( "[0-9]*" , "" , data_frame$col1) print ( "Modified DataFrame" ) print (data_frame) |
Output:
[1] "Original DataFrame" col1 1 Beginner1 2 is good 2 suitable 4 placements 3 love you 2 much [1] "Modified DataFrame" col1 1 Beginner is good 2 suitable placements 3 love you much