R – Data Frames
R Programming Language is an open-source programming language that is widely used as a statistical software and data analysis tool. Data Frames in R Language are generic data objects of R that are used to store tabular data.
Data frames can also be interpreted as matrices where each column of a matrix can be of different data types. R DataFrame is made up of three principal components, the data, rows, and columns.
R Data Frames Structure
As you can see in the image below, this is how a data frame is structured.
The data is presented in tabular form, which makes it easier to operate and understand.
Create Dataframe in R Programming Language
To create an R data frame use data.frame() function and then pass each of the vectors you have created as arguments to the function.
R
# R program to create dataframe # creating a data frame friend.data <- data.frame ( friend_id = c (1:5), friend_name = c ( "Sachin" , "Sourav" , "Dravid" , "Sehwag" , "Dhoni" ), stringsAsFactors = FALSE ) # print the data frame print (friend.data) |
Output:
friend_id friend_name
1 1 Sachin
2 2 Sourav
3 3 Dravid
4 4 Sehwag
5 5 Dhoni
Get the Structure of the R Data Frame
One can get the structure of the R data frame using str() function in R.
It can display even the internal structure of large lists which are nested. It provides one-liner output for the basic R objects letting the user know about the object and its constituents.
R
# R program to get the # structure of the data frame # creating a data frame friend.data <- data.frame ( friend_id = c (1:5), friend_name = c ( "Sachin" , "Sourav" , "Dravid" , "Sehwag" , "Dhoni" ), stringsAsFactors = FALSE ) # using str() print ( str (friend.data)) |
Output:
'data.frame': 5 obs. of 2 variables:
$ friend_id : int 1 2 3 4 5
$ friend_name: chr "Sachin" "Sourav" "Dravid" "Sehwag" ...
NULL
Summary of Data in the R data frame
In the R data frame, the statistical summary and nature of the data can be obtained by applying summary() function.
It is a generic function used to produce result summaries of the results of various model fitting functions. The function invokes particular methods which depend on the class of the first argument.
R
# R program to get the # summary of the data frame # creating a data frame friend.data <- data.frame ( friend_id = c (1:5), friend_name = c ( "Sachin" , "Sourav" , "Dravid" , "Sehwag" , "Dhoni" ), stringsAsFactors = FALSE ) # using summary() print ( summary (friend.data)) |
Output:
friend_id friend_name
Min. :1 Length:5
1st Qu.:2 Class :character
Median :3 Mode :character
Mean :3
3rd Qu.:4
Max. :5
Extract Data from Data Frame in R
Extracting data from an R data frame means that to access its rows or columns. One can extract a specific column from an R data frame using its column name.
R
# R program to extract # data from the data frame # creating a data frame friend.data <- data.frame ( friend_id = c (1:5), friend_name = c ( "Sachin" , "Sourav" , "Dravid" , "Sehwag" , "Dhoni" ), stringsAsFactors = FALSE ) # Extracting friend_name column result <- data.frame (friend.data$friend_name) print (result) |
Output:
friend.data.friend_name
1 Sachin
2 Sourav
3 Dravid
4 Sehwag
5 Dhoni
Expand Data Frame in R Language
A data frame in R can be expanded by adding new columns and rows to the already existing R data frame.
R
# R program to expand # the data frame # creating a data frame friend.data <- data.frame ( friend_id = c (1:5), friend_name = c ( "Sachin" , "Sourav" , "Dravid" , "Sehwag" , "Dhoni" ), stringsAsFactors = FALSE ) # Expanding data frame friend.data$location <- c ( "Kolkata" , "Delhi" , "Bangalore" , "Hyderabad" , "Chennai" ) resultant <- friend.data # print the modified data frame print (resultant) |
Output:
friend_id friend_name location
1 1 Sachin Kolkata
2 2 Sourav Delhi
3 3 Dravid Bangalore
4 4 Sehwag Hyderabad
5 5 Dhoni Chennai
In R, one can perform various types of operations on a data frame like accessing rows and columns, selecting the subset of the data frame, editing data frames, delete rows and columns in a data frame, etc.
Please refer to DataFrame Operations in R to know about all types of operations that can be performed on a data frame.
Access Items in R Data Frame
We can select and access any element from data frame by using single $
,brackets [ ] or
double brackets [[]]
to access columns from a data frame.
R
# creating a data frame friend.data <- data.frame ( friend_id = c (1:5), friend_name = c ( "Sachin" , "Sourav" , "Dravid" , "Sehwag" , "Dhoni" ), stringsAsFactors = FALSE ) # Access Items using [] friend.data[1] # Access Items using [[]] friend.data[[ 'friend_name' ]] # Access Items using $ friend.data$friend_id |
Output:
friend_id
1 1
2 2
3 3
4 4
5 5
Access Items using [[]]
[1] "Sachin" "Sourav" "Dravid" "Sehwag" "Dhoni"
Access Items using $
[1] 1 2 3 4 5
Amount of Rows and Columns
We can find out how many rows and columns parsant in our dataframe by using dim function.
R
# creating a data frame friend.data <- data.frame ( friend_id = c (1:5), friend_name = c ( "Sachin" , "Sourav" , "Dravid" , "Sehwag" , "Dhoni" ), stringsAsFactors = FALSE ) # find out the number of rows and clumns dim (friend.data) |
Output:
[1] 5 2
Add Rows and Columns in R Data Frame
You can easily add rows and columns in a R DataFrame. Insertion helps in expanding the already existing DataFrame, without needing a new one.
Let’s look at how to add rows and columns in a DataFrame ? with an example:
Add Rows in R Data Frame
To add rows in a Data Frame, you can use a built-in function rbind().
Following example demonstrate the working of rbind() in R Data Frame.
R
# Creating a dataframe representing products in a store Products <- data.frame ( Product_ID = c (101, 102, 103), Product_Name = c ( "T-Shirt" , "Jeans" , "Shoes" ), Price = c (15.99, 29.99, 49.99), Stock = c (50, 30, 25) ) # Print the existing dataframe cat ( "Existing dataframe (Products):\n" ) print (Products) # Adding a new row for a new product New_Product <- c (104, "Sunglasses" , 39.99, 40) Products <- rbind (Products, New_Product) # Print the updated dataframe after adding the new product cat ( "\nUpdated dataframe after adding a new product:\n" ) print (Products) |
Output:
Existing dataframe (Products):
Product_ID Product_Name Price Stock
1 101 T-Shirt 15.99 50
2 102 Jeans 29.99 30
3 103 Shoes 49.99 25
Updated dataframe after adding a new product:
Product_ID Product_Name Price Stock
1 101 T-Shirt 15.99 50
2 102 Jeans 29.99 30
3 103 Shoes 49.99 25
4 104 Sunglasses 39.99 40
Add Columns in R Data Frame
To add columns in a Data Frame, you can use a built-in function cbind().
Following example demonstrate the working of cbind() in R Data Frame.
R
# Existing dataframe representing products in a store Products <- data.frame ( Product_ID = c (101, 102, 103), Product_Name = c ( "T-Shirt" , "Jeans" , "Shoes" ), Price = c (15.99, 29.99, 49.99), Stock = c (50, 30, 25) ) # Print the existing dataframe cat ( "Existing dataframe (Products):\n" ) print (Products) # Adding a new column for 'Discount' to the dataframe Discount <- c (5, 10, 8) # New column values for discount Products <- cbind (Products, Discount) # Rename the added column colnames (Products)[ ncol (Products)] <- "Discount" # Renaming the last column # Print the updated dataframe after adding the new column cat ( "\nUpdated dataframe after adding a new column 'Discount':\n" ) print (Products) |
Output:
Existing dataframe (Products):
Product_ID Product_Name Price Stock
1 101 T-Shirt 15.99 50
2 102 Jeans 29.99 30
3 103 Shoes 49.99 25
Updated dataframe after adding a new column 'Discount':
Product_ID Product_Name Price Stock Discount
1 101 T-Shirt 15.99 50 5
2 102 Jeans 29.99 30 10
3 103 Shoes 49.99 25 8
Remove Rows and Columns
A data frame in R removes columns and rows from the already existing R data frame.
Remove Row in R DataFrame
R
library (dplyr) # Create a data frame data <- data.frame ( friend_id = c (1, 2, 3, 4, 5), friend_name = c ( "Sachin" , "Sourav" , "Dravid" , "Sehwag" , "Dhoni" ), location = c ( "Kolkata" , "Delhi" , "Bangalore" , "Hyderabad" , "Chennai" ) ) data # Remove a row with friend_id = 3 data <- subset (data, friend_id != 3) data |
Output:
friend_id friend_name location
1 1 Sachin Kolkata
2 2 Sourav Delhi
3 3 Dravid Bangalore
4 4 Sehwag Hyderabad
5 5 Dhoni Chennai
# Remove a row with friend_id = 3
friend_id friend_name location
1 1 Sachin Kolkata
2 2 Sourav Delhi
4 4 Sehwag Hyderabad
5 5 Dhoni Chennai
In the above code, we first created a data frame called data with three columns: friend_id, friend_name, and location. To remove a row with friend_id equal to 3, we used the subset() function and specified the condition friend_id != 3. This removed the row with friend_id equal to 3.
Remove Column in R DataFrame
R
library (dplyr) # Create a data frame data <- data.frame ( friend_id = c (1, 2, 3, 4, 5), friend_name = c ( "Sachin" , "Sourav" , "Dravid" , "Sehwag" , "Dhoni" ), location = c ( "Kolkata" , "Delhi" , "Bangalore" , "Hyderabad" , "Chennai" ) ) data # Remove the 'location' column data <- select (data, -location) data |
Output:
friend_id friend_name location
1 1 Sachin Kolkata
2 2 Sourav Delhi
3 3 Dravid Bangalore
4 4 Sehwag Hyderabad
5 5 Dhoni Chennai
>
Remove the 'location' column
friend_id friend_name
1 1 Sachin
2 2 Sourav
3 3 Dravid
4 4 Sehwag
5 5 Dhoni
To remove the location column, we used the select() function and specified -location. The – sign indicates that we want to remove the location column. The resulting data frame data will have only two columns: friend_id and friend_name.
Combining Data Frames in R
There are 2 way to combine data frames in R. You can either combine them vertically or horizontally.
Let’s look at both cases with example:
Combine R Data Frame Vertically
If you want to combine 2 data frames vertically, you can use rbind() function. This function works for combination of two or more data frames.
R
# Creating two sample dataframes df1 <- data.frame ( Name = c ( "Alice" , "Bob" ), Age = c (25, 30), Score = c (80, 75) ) df2 <- data.frame ( Name = c ( "Charlie" , "David" ), Age = c (28, 35), Score = c (90, 85) ) # Print the existing dataframes cat ( "Dataframe 1:\n" ) print (df1) cat ( "\nDataframe 2:\n" ) print (df2) # Combining the dataframes using rbind() combined_df <- rbind (df1, df2) # Print the combined dataframe cat ( "\nCombined Dataframe:\n" ) print (combined_df) |
Output:
Dataframe 1:
Name Age Score
1 Alice 25 80
2 Bob 30 75
Dataframe 2:
Name Age Score
1 Charlie 28 90
2 David 35 85
Combined Dataframe:
Name Age Score
1 Alice 25 80
2 Bob 30 75
3 Charlie 28 90
4 David 35 85
Combine R Data Frame Horizontally:
If you want to combine 2 data frames horizontally, you can use cbind() function. This function works for combination of two or more data frames.
R
# Creating two sample dataframes df1 <- data.frame ( Name = c ( "Alice" , "Bob" ), Age = c (25, 30), Score = c (80, 75) ) df2 <- data.frame ( Height = c (160, 175), Weight = c (55, 70) ) # Print the existing dataframes cat ( "Dataframe 1:\n" ) print (df1) cat ( "\nDataframe 2:\n" ) print (df2) # Combining the dataframes using cbind() combined_df <- cbind (df1, df2) # Print the combined dataframe cat ( "\nCombined Dataframe:\n" ) print (combined_df) |
Output:
Dataframe 1:
Name Age Score
1 Alice 25 80
2 Bob 30 75
Dataframe 2:
Height Weight
1 160 55
2 175 70
Combined Dataframe:
Name Age Score Height Weight
1 Alice 25 80 160 55
2 Bob 30 75 175 70
Also Read:
In this article we have covered R Data Frames, and all basic operations like create, access, summary, add and remove. This article purposes to make you familiar with data frames in R so that you can use it in your projects.
Hope this helps you in understanding the concept of data frames in R and you can easily implement R data frame in your projects.