How to Get Rid of Multilevel Index After Using Pivot Table in Pandas

Pandas is a powerful and versatile library in Python for data manipulation and analysis. One of its most useful features is the pivot table, which allows you to reshape and summarize data. However, using pivot tables often results in a multilevel (hierarchical) index, which can be cumbersome to work with. In this article, we will explore how to get rid of the multilevel index after using a pivot table in Pandas, making your data easier to handle and analyze.

Table of Content

  • Understanding Pivot Tables in Pandas
  • Understanding Multilevel Index
  • Removing Multilevel Index Using Pivot Table
    • 1. Using reset_index()
    • 2. Using droplevel()
    • 3. Using rename_axis()
  • Removing Multilevel Indexes in Pandas DataFrames: Practical Examples and Techniques

Understanding Pivot Tables in Pandas

Pivot tables are a powerful tool for data analysis, allowing you to transform and summarize data in a way that makes it easier to understand and analyze. In Pandas, the pivot_table function is used to create pivot tables. It provides a flexible way to group, aggregate, and reshape data.

Example:

Python
import pandas as pd

data = {
    'Date': ['2023-01-01', '2023-01-01', '2023-01-02', '2023-01-02'],
    'Category': ['A', 'B', 'A', 'B'],
    'Value': [10, 20, 30, 40]
}

df = pd.DataFrame(data)
pivot_df = df.pivot_table(values='Value', index='Date', columns='Category', aggfunc='sum')
print(pivot_df)

Output:

Category         A   B
Date                  
2023-01-01      10  20
2023-01-02      30  40

In this example, the pivot table has a multilevel index with ‘Date’ as the index and ‘Category’ as the columns.

Understanding Multilevel Index

A multilevel index (or hierarchical index) in Pandas allows you to have multiple levels of indexing on your DataFrame. While this can be useful for certain types of data analysis, it can also make the DataFrame more complex and harder to work with. Therefore, it is often desirable to flatten the DataFrame by removing the multilevel index.

Creating a Pivot Table

Let’s start by creating a pivot table from a sample DataFrame. We’ll use the same example as above but with a slightly more complex dataset.

Python
data = {
    'Date': ['2023-01-01', '2023-01-01', '2023-01-02', '2023-01-02', '2023-01-03', '2023-01-03'],
    'Category': ['A', 'B', 'A', 'B', 'A', 'B'],
    'Subcategory': ['X', 'Y', 'X', 'Y', 'X', 'Y'],
    'Value': [10, 20, 30, 40, 50, 60]
}

df = pd.DataFrame(data)
pivot_df = df.pivot_table(values='Value', index=['Date', 'Category'], columns='Subcategory', aggfunc='sum')
print(pivot_df)

Output:

Subcategory         X   Y
Date       Category      
2023-01-01 A         10 NaN
           B        NaN  20
2023-01-02 A         30 NaN
           B        NaN  40
2023-01-03 A         50 NaN
           B        NaN  60

Here, the pivot table has a multilevel index with ‘Date’ and ‘Category’ as the index and ‘Subcategory’ as the columns.

Removing Multilevel Index Using Pivot Table

There are several methods to remove the multilevel index from a DataFrame in Pandas. Let’s explore each method in detail. Removing Multilevel Index:

  • Using reset_index()
  • Using droplevel()
  • Using rename_axis()

1. Using reset_index()

The reset_index() method is the most straightforward way to remove the multilevel index. It resets the index of the DataFrame, converting the index levels into columns.

Python
flat_df = pivot_df.reset_index()
print(flat_df)

Output:

Subcategory        Date Category     X     Y
0            2023-01-01        A  10.0   NaN
1            2023-01-01        B   NaN  20.0
2            2023-01-02        A  30.0   NaN
3            2023-01-02        B   NaN  40.0
4            2023-01-03        A  50.0   NaN
5            2023-01-03        B   NaN  60.0

2. Using droplevel()

The droplevel() method can be used to remove specific levels from the index. This method is useful if you want to drop only certain levels of the multilevel index.

Python
flat_df = pivot_df.droplevel(level=1)
print(flat_df)

Output:

Subcategory         X   Y
Date                  
2023-01-01      10 NaN
2023-01-01     NaN  20
2023-01-02      30 NaN
2023-01-02     NaN  40
2023-01-03      50 NaN
2023-01-03     NaN  60

In this example, we dropped the ‘Category’ level from the index

3. Using rename_axis()

The rename_axis() method can be used to rename the index or column labels. By setting the index or column labels to None, you can effectively remove the multilevel index.

Python
flat_df = pivot_df.rename_axis(index=None, columns=None).reset_index()
print(flat_df)

Output:

         Date Category     X     Y
0  2023-01-01        A  10.0   NaN
1  2023-01-01        B   NaN  20.0
2  2023-01-02        A  30.0   NaN
3  2023-01-02        B   NaN  40.0
4  2023-01-03        A  50.0   NaN
5  2023-01-03        B   NaN  60.0

Removing Multilevel Indexes in Pandas DataFrames: Practical Examples and Techniques

Let’s look at some practical examples to illustrate how to remove the multilevel index in different scenarios.

Example 1: Sales Data

Consider a sales dataset with multiple levels of indexing.

Python
sales_data = {
    'Date': ['2023-01-01', '2023-01-01', '2023-01-02', '2023-01-02'],
    'Store': ['A', 'B', 'A', 'B'],
    'Product': ['X', 'Y', 'X', 'Y'],
    'Sales': [100, 200, 150, 250]
}

df = pd.DataFrame(sales_data)
pivot_df = df.pivot_table(values='Sales', index=['Date', 'Store'], columns='Product', aggfunc='sum')
print(pivot_df)

Output:

Product         X    Y
Date       Store      
2023-01-01 A     100  NaN
           B     NaN  200
2023-01-02 A     150  NaN
           B     NaN  250

To remove the multilevel index:

Python
flat_df = pivot_df.reset_index()
print(flat_df)

Output:

Product        Date Store      X      Y
0        2023-01-01     A  100.0    NaN
1        2023-01-01     B    NaN  200.0
2        2023-01-02     A  150.0    NaN
3        2023-01-02     B    NaN  250.0

Example 2: Financial Data

Consider a financial dataset with multiple levels of indexing.

Python
financial_data = {
    'Year': [2021, 2021, 2022, 2022],
    'Quarter': ['Q1', 'Q2', 'Q1', 'Q2'],
    'Revenue': [1000, 1500, 2000, 2500],
    'Profit': [200, 300, 400, 500]
}

df = pd.DataFrame(financial_data)
pivot_df = df.pivot_table(values=['Revenue', 'Profit'], index=['Year', 'Quarter'], aggfunc='sum')
print(pivot_df)

Output:

           Profit  Revenue
Year Quarter                
2021 Q1       200     1000
     Q2       300     1500
2022 Q1       400     2000
     Q2       500     2500

To remove the multilevel index:

Python
flat_df = pivot_df.reset_index()
print(flat_df)

Output:

   Year Quarter  Profit  Revenue
0  2021      Q1     200     1000
1  2021      Q2     300     1500
2  2022      Q1     400     2000
3  2022      Q2     500     2500

Conclusion

Removing the multilevel index from a pivot table in Pandas can simplify your DataFrame and make it easier to work with. In this article, we explored several methods to achieve this, including reset_index()droplevel(), and rename_axis(). Each method has its own use cases and advantages, allowing you to choose the best approach for your specific needs.

By mastering these techniques, you can efficiently manage and analyze your data, making your data manipulation tasks in Pandas more streamlined and effective.