Pandas Groupby – Sort within groups
Pandas Groupby is used in situations where we want to split data and set into groups so that we can do various operations on those groups like – Aggregation of data, Transformation through some group computations or Filtration according to specific conditions applied on the groups.
In similar ways, we can perform sorting within these groups.
Example 1: Let’s take an example of a dataframe:
df = pd.DataFrame({ 'X' : [ 'B' , 'B' , 'A' , 'A' ], 'Y' : [ 1 , 2 , 3 , 4 ]}) # using groupby function df.groupby( 'X' ). sum () |
Output:
Let’s pass the sort parameter as False.
# using groupby function # with sort df.groupby( 'X' , sort = False ). sum () |
Output:
Here, we see a dataframe with sorted values within the groups.
Example 2:
Now, let’s take an example of a dataframe with ages of different people. Using sort along with groupby function will arrange the transformed dataframe on the basis of keys passes, for potential speedups.
data = { 'Name' :[ 'Elle' , 'Chloe' , 'Noah' , 'Marco' , 'Lee' , 'Elle' , 'Rachel' , 'Noah' ], 'Age' :[ 17 , 19 , 18 , 17 , 22 , 18 , 21 , 20 ]} df = pd.DataFrame(data) df |
Output:
Let’s group the above dataframe according to the name
# using groupby without sort df.groupby([ 'Name' ]). sum () |
Output:
Passing the sort parameter as False
# using groupby function # with sort df.groupby([ 'Name' ], sort = False ). sum () |
Output:
Example 3:
Let’s take another example of a dataframe that consists top speeds of various cars and bikes.
We’ll try to get the top speeds sorted within the groups of vehicle type.
import pandas as pd df = pd.DataFrame([( 'Bike' , 'Kawasaki' , 186 ), ( 'Bike' , 'Ducati Panigale' , 202 ), ( 'Car' , 'Bugatti Chiron' , 304 ), ( 'Car' , 'Jaguar XJ220' , 210 ), ( 'Bike' , 'Lightning LS-218' , 218 ), ( 'Car' , 'Hennessey Venom GT' , 270 ), ( 'Bike' , 'BMW S1000RR' , 188 )], columns = ( 'Type' , 'Name' , 'top_speed(mph)' )) df |
Output:
After Using the groupby function
# Using groupby function grouped = df.groupby([ 'Type' ])[ 'top_speed(mph)' ].nlargest() # using nlargest() function will get the # largest values of top_speed(mph) within # groups created print (grouped) |
Output: