Count Unique Values by Group in R
In the article, we are going to discuss how to count the number of unique values by the group in R Programming Language. So let’s take the following example,
Suppose you have a dataset with multiple columns like this:
class |
age |
age_group |
|
1 |
A |
20 |
YOUNG |
2 |
B |
15 |
KID |
3 |
C |
45 |
OLD |
4 |
B |
14 |
KID |
5 |
A |
21 |
YOUNG |
6 |
A |
22 |
YOUNG |
7 |
C |
47 |
OLD |
8 |
A |
19 |
YOUNG |
9 |
B |
16 |
KID |
10 |
C |
50 |
OLD |
11 |
A |
23 |
YOUNG |
In this dummy dataset class, age, age_group represent column names and our task is to count the number of unique values by age_group.
So, that the resultant dataset should look like this:
|
age_group |
unique_count |
1 |
YOUNG |
5 |
2 |
KID |
3 |
3 |
OLD |
3 |
Method 1: Using aggregate function
Using aggregate function we can perform operation on multiple rows (by grouping the data) and produce a single summary value.
Example:
R
# Count Unique values by group # Creating dataset # creating class column x <- c ( "A" , "B" , "C" , "B" , "A" , "A" , "C" , "A" , "B" , "C" , "A" ) # creating age column y <- c (20,15,45,14,21,22,47,18,16,50,23) # creating age_group column z <- c ( "YOUNG" , "KID" , "OLD" , "KID" , "YOUNG" , "YOUNG" , "OLD" , "YOUNG" , "KID" , "OLD" , "YOUNG" ) # creating dataframe df <- data.frame (class=x,age=y,age_group=z) df # applying aggregate function aggregate ( age~age_group,df, function (x) length ( unique (x))) |
Output:
Method 2: Using dplyr package and group_by function
“dplyr“ is the most widely used R package. It is mainly used for data wrangling purpose. It provides set of tools for data manipulation.
Example:
R
# Count Unique values by group # loading dplyr library ( "dplyr" ) # Creating dataset # creating class column x <- c ( "A" , "B" , "C" , "B" , "A" , "A" , "C" , "A" , "B" , "C" , "A" ) # creating age column y <- c (20,15,45,14,21,22,47,18,16,50,23) # creating age_group column z <- c ( "YOUNG" , "KID" , "OLD" , "KID" , "YOUNG" , "YOUNG" , "OLD" , "YOUNG" , "KID" , "OLD" , "YOUNG" ) # creating dataframe df <- data.frame (class=x,age=y,age_group=z) # grouping age_group column # counting all the unique # value based on the age_group # column df %>% group_by (age_group) %>% summarise ( n_distinct (age)) |
Output: