Adding new column to existing DataFrame in PandasAdding a new Column to Existing DataFrame in Pandas in Python
Adding new columns to an existing DataFrame is a fundamental task in data analysis using Pandas. It allows you to enrich your data with additional information and facilitate further analysis and manipulation. This article will explore various methods for adding new columns, including simple assignment, the insert()
method, the assign()
method. Let’s discuss adding new columns to Pandas’s existing DataFrame.
What is Pandas DataFrame?
A Pandas DataFrame is a two-dimensional, size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). It’s a fundamental data structure in the Python data science ecosystem and provides a powerful way to work with tabular data.
Here are some key features of a Pandas DataFrame:
- Data representation: Stores data in a table format with rows and columns.
- Heterogeneous data types: Can hold different data types in different columns (e.g., integers, floats, strings, booleans).
- Labeling: Each row and column has a label (index and column names).
- Mutable: Allows data manipulation and modification.
- Powerful operations: Provides various functions and methods for data analysis, manipulation, and exploration.
- Extensible: Can be customized and extended with additional functionalities through libraries and user-defined functions.
Adding a new Column to Existing DataFrame in Pandas in Python
There are multiple ways to add a new Column to an Existing DataFrame in Pandas in Python:
- Creating a Sample Dataframe
- By using Dataframe.insert() method
- By using Dataframe.assign() method
- Using Dictionary
- Using List
- Using .loc()
- Adding More than One columns in Existing Dataframe
Creating a Sample Dataframe
Here we are creating a Sample Dataframe:
Python3
import pandas as pd data = { 'Name' : [ 'Jai' , 'Princi' , 'Gaurav' , 'Anuj' ], 'Height' : [ 5.1 , 6.2 , 5.1 , 5.2 ], 'Qualification' : [ 'Msc' , 'MA' , 'Msc' , 'Msc' ]} df = pd.DataFrame(data) print (df) |
Output:
Name Height Qualification 0 Jai 5.1 Msc 1 Princi 6.2 MA 2 Gaurav 5.1 Msc 3 Anuj 5.2 Msc
Note that the length of your list should match the length of the index column otherwise it will show an error.
Add a New Column to an Existing Datframe using DataFrame.insert()
It gives the freedom to add a column at any position we like and not just at the end. It also provides different options for inserting the column values.
Python3
import pandas as pd # Define a dictionary containing Students data data = { 'Name' : [ 'Jai' , 'Princi' , 'Gaurav' , 'Anuj' ], 'Height' : [ 5.1 , 6.2 , 5.1 , 5.2 ], 'Qualification' : [ 'Msc' , 'MA' , 'Msc' , 'Msc' ]} # Convert the dictionary into DataFrame df = pd.DataFrame(data) # Using DataFrame.insert() to add a column df.insert( 2 , "Age" , [ 21 , 23 , 24 , 21 ], True ) # Observe the result print (df) |
Output:
Name Height Age Qualification 0 Jai 5.1 21 Msc 1 Princi 6.2 23 MA 2 Gaurav 5.1 24 Msc 3 Anuj 5.2 21 Msc
Adding Columns to Pandas DataFrame using Dataframe.assign()
This method will create a new dataframe with a new column added to the old dataframe.
Python3
import pandas as pd # Define a dictionary containing Students data data = { 'Name' : [ 'Jai' , 'Princi' , 'Gaurav' , 'Anuj' ], 'Height' : [ 5.1 , 6.2 , 5.1 , 5.2 ], 'Qualification' : [ 'Msc' , 'MA' , 'Msc' , 'Msc' ]} # Convert the dictionary into DataFrame df = pd.DataFrame(data) # Using 'Address' as the column name and equating it to the list df2 = df.assign(address = [ 'Delhi' , 'Bangalore' , 'Chennai' , 'Patna' ]) print (df2) |
Output:
Name Height Qualification address 0 Jai 5.1 Msc Delhi 1 Princi 6.2 MA Bangalore 2 Gaurav 5.1 Msc Chennai 3 Anuj 5.2 Msc Patna
Pandas Add Column to DataFrame using a Dictionary
We can use a Python dictionary to add a new column in pandas DataFrame. Use an existing column as the key values and their respective values will be the values for a new column.
Python3
# Import pandas package import pandas as pd # Define a dictionary containing Students data data = { 'Name' : [ 'Jai' , 'Princi' , 'Gaurav' , 'Anuj' ], 'Height' : [ 5.1 , 6.2 , 5.1 , 5.2 ], 'Qualification' : [ 'Msc' , 'MA' , 'Msc' , 'Msc' ]} # Define a dictionary with key values of # an existing column and their respective # value pairs as the # values for our new column. address = { 'Delhi' : 'Jai' , 'Bangalore' : 'Princi' , 'Patna' : 'Gaurav' , 'Chennai' : 'Anuj' } # Convert the dictionary into DataFrame df = pd.DataFrame(data) # Provide 'Address' as the column name df[ 'Address' ] = address # Observe the output print (df) |
Output:
Name Height Qualification Address 0 Jai 5.1 Msc Delhi 1 Princi 6.2 MA Bangalore 2 Gaurav 5.1 Msc Chennai 3 Anuj 5.2 Msc Patna
Adding a New Column to a Pandas DataFrame using List
In this example, Pandas add new columns from list “Address” to an existing Pandas DataFrame using a dictionary and a list.
Python3
# Declare a list that is to be converted into a column address = [ 'Delhi' , 'Bangalore' , 'Chennai' , 'Patna' ] # Using 'Address' as the column name # and equating it to the list df[ 'Address' ] = address print (df) |
Output:
Name Height Qualification Address 0 Jai 5.1 Msc Delhi 1 Princi 6.2 MA Bangalore 2 Gaurav 5.1 Msc Chennai 3 Anuj 5.2 Msc Patna
Add A New Column To An Existing Pandas DataFrame using Dataframe.loc()
In this example, It creates a Pandas DataFrame named df
with columns “Name”, “Height”, and “Qualification” and adds a new column “Address” using the loc
attribute.
Python3
import pandas as pd data = { 'Name' : [ 'Jai' , 'Princi' , 'Gaurav' , 'Anuj' ], 'Height' : [ 5.1 , 6.2 , 5.1 , 5.2 ], 'Qualification' : [ 'Msc' , 'MA' , 'Msc' , 'Msc' ]} df = pd.DataFrame(data) # Create the list of new column values address = [ "Delhi" , "Bangalore" , "Chennai" , "Patna" ] # Add the new column using loc df.loc[:, "Address" ] = address print (df) |
Output:
Name Height Qualification Address 0 Jai 5.1 Msc Delhi 1 Princi 6.2 MA Bangalore 2 Gaurav 5.1 Msc Chennai 3 Anuj 5.2 Msc Patna
Adding More than One columns in Existing Dataframe
In this example, it expands an existing Pandas DataFrame df
with two new columns, “Age” and “State”, using their respective data lists.
Python3
import pandas as pd data = { 'Name' : [ 'Jai' , 'Princi' , 'Gaurav' , 'Anuj' ], 'Height' : [ 5.1 , 6.2 , 5.1 , 5.2 ], 'Qualification' : [ 'Msc' , 'MA' , 'Msc' , 'Msc' ], 'Address' : [ 'Delhi' , 'Bangalore' , 'Chennai' , 'Patna' ]} df = pd.DataFrame(data) # Define new data for additional columns age = [ 22 , 25 , 23 , 24 ] state = [ 'NCT' , 'Karnataka' , 'Tamil Nadu' , 'Bihar' ] # Add multiple columns using dictionary assignment new_data = { 'Age' : age, 'State' : state } df = df.assign( * * new_data) print (df) |
Output:
Name Height Qualification Address Age State 0 Jai 5.1 Msc Delhi 22 NCT 1 Princi 6.2 MA Bangalore 25 Karnataka 2 Gaurav 5.1 Msc Chennai 23 Tamil Nadu 3 Anuj 5.2 Msc Patna 24 Bihar
Conclusion
Understanding how to add new columns to DataFrames is essential for data exploration and manipulation in Pandas. Choosing the appropriate method depends on the specific context and desired outcome. By mastering these techniques, you can effectively manipulate, analyze, and gain valuable insights from your data.