Flattening hierarchical index in pandas dataframe using groupby

Whenever we use the groupby function on a single column with multiple aggregation functions we get multiple hierarchical indexes based on the aggregation type. In such cases, the hierarchical index has to be flattened at both levels. 

Syntax: pandas.DataFrame.groupby(by=None, axis=0, level=None)

Parameter:

  • by – mapping function that determines the groups in groupby function
  • axis – 0 – splits along rows and 1 – splits along columns.
  • level – if the axis is multi-indexed, groups at a specified level. (int)

Syntax: pandas.DataFrame.agg(func=None, axis=0)

Parameter:

  • func – specifies the function to be used as aggregation function. (min, max, sum etc)
  • axis – 0 – function applied to each column and 1- applied to each row.

Example

Import the python pandas package. Create a sample dataframe showing the car sales in two-quarters q1 and q2 as shown. Now use the pandas groupby function to group based on the sum and max of sales on quarter 1 and sum and min of sales 2. The grouped dataframe has multi-indexed columns stored in a list of tuples. Use a for loop to iterate through the list of tuples and join them as a single string. Append the joined strings in the flat_cols list. </li > <li > Now assign the flat_cols list to the column names of the multi-indexed grouped dataframe columns.

Python3




# group by cars based on
# the sum and max of sales on quarter 1
# and sum
grouped_data = data.groupby(by="cars").agg(
                 {"sale_q1 in Cr": [sum, max],
                'sale_q2 in Cr': [sum, min]})
 
# create an empty list to save the
# names of the flattened columns
flat_cols = []
 
# iterate through this tuples and
# join them as single string
for i in grouped_data.columns:
    flat_cols.append(i[0]+'_'+i[1])
 
# now assign the list of flattened
# columns to the grouped columns.
grouped_data.columns = flat_cols
 
# print the grouped data
print(grouped_data)


Output: 

How to flatten a hierarchical index in Pandas

How to flatten a hierarchical index in Pandas DataFrame columns?

In this article, we are going to see the flatten a hierarchical index in Pandas DataFrame columns. Hierarchical Index usually occurs as a result of groupby() aggregation functions. Flatten hierarchical index in Pandas, the aggregated function used will appear in the hierarchical index of the resulting dataframe.

Similar Reads

Using reset_index() function

Pandas provide a function called reset_index() to flatten the hierarchical index created due to the groupby aggregation function in Python....

Using as_index() function

...

Flattening hierarchical index in pandas dataframe using groupby

Pandas provide a function called as_index() which is specified by a boolean value. The as_index() functions groups the dataframe by the specified aggregate function and if  as_index() value is False, the resulting dataframe is flattened....

Flattening hierarchical index using to_records() function

...

Flattening hierarchical columns using join() and rstrip()

Whenever we use the groupby function on a single column with multiple aggregation functions we get multiple hierarchical indexes based on the aggregation type. In such cases, the hierarchical index has to be flattened at both levels....