How to Calculate Percentiles in R?
In this article, we will discuss how to calculate percentiles in the R programming language.
Percentiles are measures of central tendency, which depict that out of the total data about certain percent data lies below it. In R, we can use quantile() function to get the job done.
Syntax: quantile( data, probs)
Parameter:
- data: data whose percentiles are to be calculated
- probs: percentile value
Example 1: Calculate percentile
To calculate the percentile we simply pass the data and the value of the required percentile.
R
x<- c (2,13,5,36,12,50) res<- quantile (x,probs=0.5) res |
Output:
50% 12.5
Example 2: Calculate percentiles of vector
We can calculate multiple percentiles at once. For that, we have to pass the vector of percentiles instead of a single value to probs parameter.
R
x<- c (2,13,5,36,12,50) res<- quantile (x,probs= c (0.5,0.75)) res |
Output:
50% 75% 12.50 30.25
Example 4: Calculate percentile in dataframe
Sometimes requirement asks for calculating percentiles for a dataframe column in that case the entire process remains same only you have to pass the column name in place of data along with the percentile value to be calculated.
R
df<- data.frame (x= c (2,13,5,36,12,50), y= c ( 'a' , 'b' , 'c' , 'c' , 'c' , 'b' )) res<- quantile (df$x,probs= c (0.35,0.7)) res |
Output:
35% 70% 10.25 24.50
Example 5: Quantiles of several and all columns
We can also find percentiles of several dataframe columns at once. This can also be applied to find the percentiles of all numeric columns of dataframe. For this we use apply() function, within this we will pass the dataframe with just numeric columns and the quantile function that has to be applied on all columns.
Syntax: apply( dataframe, function)
R
df<- data.frame (x= c (2,13,5,36,12,50), y= c ( 'a' , 'b' , 'c' , 'c' , 'c' , 'b' ), z= c (2.1,6,3.8,4.8,2.2,1.1)) sub_df<-df[, c ( 'x' , 'z' )] res<- apply (sub_df, 2, function (x) quantile (x,probs=0.5)) res |
Output:
x z 12.5 3.0
Example 6: Calculate Quantiles by group
We can also group values together and find the percentile with respect to each group. For this, we use groupby() function, and then within summarize() we will apply the quantile function.
R
library (dplyr) df<- data.frame (x= c (2,13,5,36,12,50), y= c ( 'a' , 'b' , 'c' , 'c' , 'c' , 'b' )) df %>% group_by (y) %>% summarize (res= quantile (x,probs=0.5)) |
Output:
A tibble: 3 x 2 y res <chr> <dbl> a 2 b 31.5 c 12
Example 7: Visualizing percentiles
Visualizing percentiles can make it better to understand.
R
df<- data.frame (x= c (2,13,5,36,12,50), y= c ( 'a' , 'b' , 'c' , 'c' , 'c' , 'b' ), z= c (2.1,6,3.8,4.8,2.2,1.1)) n<- length (df$x) plot ((1:n-1)/(n-1), sort (df$x.Length), type= 'h' , xlab = "Percentile" , ylab = "Value" ) |
Output: