Sum of rows based on column value in R dataframe
In this article, we will be discussing how we can sum up row values based on column value in a data frame in R Programming Language. Suppose you have a data frame like this:
|
fruits |
shop_1 |
shop_2 |
---|---|---|---|
1. |
Apple |
1 |
13 |
2. |
Mango |
9 |
5 |
3. |
Strawberry |
2 |
14 |
4. |
Apple |
10 |
6 |
5. |
Apple |
3 |
15 |
6. |
Strawberry |
11 |
7 |
7. |
Mango |
4 |
16 |
8. |
Strawberry |
12 |
8 |
This dataset consists of fruits name and shop_1, shop_2 as a column name. Here shop_1 and shop_2 show the number of fruits available in shops. Now you want to find the aggregate sum of all the rows in shope_1 that have the same fruit value. So our dataset looks like this :
|
fruits |
shop_1 |
shop_2 |
---|---|---|---|
1. |
Apple |
14 |
34 |
2. |
Mango |
13 |
21 |
3. |
Strawberry |
25 |
29 |
Method 1: Using aggregate function
The aggregate function creates a subset of the original data and computes the statistical function for each subset and returns the result.
Syntax:
aggregate(.~fruit,data=df,FUN=sum)
Example:
R
# Sum of rows based on column values # Creating dataset # creating fuits column x <- c ( "Apple" , "Mango" , "Strawberry" , "Apple" , "Apple" , "Strawberry" , "Mango" , "Strawberry" ) # creating shop_1 column y <- c (1,9,2,10,3,11,4,12) # creating shop_2 column z <- c (13,5,14,6,15,7,16,8) # creating dataframe df <- data.frame (fruits=x,shop_1=y,shop_2=z) # applying aggregate function aggregate (.~fruits,data=df,FUN=sum) |
Output:
Method 2: Using ddply and numcolwise function
ddply simply split the given data frame and perform any operation on it (probably apply a function) and return the data frame.
colwise is a function from the famous plyr package. colwise function is used to compute a function on each column in data frame, it computes column wise.
Example:
R
# Sum of rows based on column values # loading library library (plyr) # Creating dataset # creating fuits column x <- c ( "toy1" , "toy2" , "toy3" , "toy1" , "toy1" , "toy3" , "toy2" , "toy3" ) # creating stock_1 column y <- c (1,2,3,4,5,6,4,8) # creating stock_2 column z <- c (9,1,10,5,2,6,4,8) # creating dataframe df <- data.frame (toys=x,stock_1=y,stock_2=z) # using sum function colwise ddply (df, "toys" , numcolwise (sum)) |
Output: