Python | Merge, Join and Concatenate DataFrames using Pandas
A dataframe is a two-dimensional data structure having multiple rows and columns. In a Pandas DataFrame, the data is aligned in the form of rows and columns only. A dataframe can perform arithmetic as well as conditional operations. It has a mutable size. This article will show how to join, concatenate, and merge in Pandas.
Python Merge, Join, and Concatenate DataFrames Using Pandas
Below are the different ways and approaches by which we can merge, join, and concatenate in Pandas in Python:
DataFrames Concatenation in Python Pandas
concat()
function does all of the heavy lifting of performing concatenation operations along an axis while performing optional set logic (union or intersection) of the indexes (if any) on the other axes. In this example, three DataFrames (df1
, df2
, and df3
) with identical columns but different indices are created. Using pd.concat()
, these Pandas DataFrames are vertically stacked, resulting in a combined DataFrame where rows from each original DataFrame follow one another.
Python3
# Creating first dataframe df1 = pd.DataFrame({ 'A' : [ 'A0' , 'A1' , 'A2' , 'A3' ], 'B' : [ 'B0' , 'B1' , 'B2' , 'B3' ], 'C' : [ 'C0' , 'C1' , 'C2' , 'C3' ], 'D' : [ 'D0' , 'D1' , 'D2' , 'D3' ]}, index = [ 0 , 1 , 2 , 3 ]) # Creating second dataframe df2 = pd.DataFrame({ 'A' : [ 'A4' , 'A5' , 'A6' , 'A7' ], 'B' : [ 'B4' , 'B5' , 'B6' , 'B7' ], 'C' : [ 'C4' , 'C5' , 'C6' , 'C7' ], 'D' : [ 'D4' , 'D5' , 'D6' , 'D7' ]}, index = [ 4 , 5 , 6 , 7 ]) # Creating third dataframe df3 = pd.DataFrame({ 'A' : [ 'A8' , 'A9' , 'A10' , 'A11' ], 'B' : [ 'B8' , 'B9' , 'B10' , 'B11' ], 'C' : [ 'C8' , 'C9' , 'C10' , 'C11' ], 'D' : [ 'D8' , 'D9' , 'D10' , 'D11' ]}, index = [ 8 , 9 , 10 , 11 ]) # Concatenating the dataframes pd.concat([df1, df2, df3]) |
Output:
A B C D
0 A0 B0 C0 D0
1 A1 B1 C1 D1
2 A2 B2 C2 D2
3 A3 B3 C3 D3
4 A4 B4 C4 D4
5 A5 B5 C5 D5
6 A6 B6 C6 D6
7 A7 B7 C7 D7
8 A8 B8 C8 D8
9 A9 B9 C9 D9
10 A10 B10 C10 D10
11 A11 B11 C11 D11
Python Merge in Pandas in Python
DataFrames Merge Pandas provides a single function, merge(), as the entry point for all standard database join operations between DataFrame objects. In this example, two DataFrames (left
and right
) are created with a common key column ‘Key’. The pd.merge()
function merges these DataFrames based on their common ‘Key’ column using an inner join, resulting in a combined DataFrame containing only the rows where the key values match in both DataFrames.
Python3
# Dataframe created left = pd.DataFrame({ 'Key' : [ 'K0' , 'K1' , 'K2' , 'K3' ], 'A' : [ 'A0' , 'A1' , 'A2' , 'A3' ], 'B' : [ 'B0' , 'B1' , 'B2' , 'B3' ]}) right = pd.DataFrame({ 'Key' : [ 'K0' , 'K1' , 'K2' , 'K3' ], 'C' : [ 'C0' , 'C1' , 'C2' , 'C3' ], 'D' : [ 'D0' , 'D1' , 'D2' , 'D3' ]}) # Merging the dataframes pd.merge(left, right, how = 'inner' , on = 'Key' ) |
Output:
Key A B C D
0 K0 A0 B0 C0 D0
1 K1 A1 B1 C1 D1
2 K2 A2 B2 C2 D2
3 K3 A3 B3 C3 D3
Left Merge
The pd.merge()
function with how='left'
performs a left join, merging the left
DataFrame with the right
DataFrame on the ‘Key’ column. All rows from the left
DataFrame are retained, and matching rows from the right
DataFrame are appended with their respective values.
Python3
left_merged = pd.merge(left, right, how = 'left' , on = 'Key' ) print (left_merged) |
Output:
Key A B C D
0 K0 A0 B0 C0 D0
1 K1 A1 B1 C1 D1
2 K2 A2 B2 C2 D2
3 K3 A3 B3 C3 D3
Right Merge
The pd.merge()
function with how='right'
performs a right join, merging the left
DataFrame with the right
DataFrame on the ‘Key’ column. All rows from the right
DataFrame are retained, and matching rows from the left
DataFrame are appended with their respective values.
Python3
right_merged = pd.merge(left, right, how = 'right' , on = 'Key' ) print (right_merged) |
Output:
Key A B C D
0 K0 A0 B0 C0 D0
1 K1 A1 B1 C1 D1
2 K2 A2 B2 C2 D2
3 K3 A3 B3 C3 D3
Outer Merge
The pd.merge()
function with how='outer'
performs a full outer join, merging the left
DataFrame with the right
DataFrame on the ‘Key’ column. This results in a DataFrame that contains all rows from both the left
and right
DataFrames, filling in missing values with NaN where there isn’t a match.
Python3
outer_merged = pd.merge(left, right, how = 'outer' , on = 'Key' ) print (outer_merged) |
Output:
Key A B C D
0 K0 A0 B0 C0 D0
1 K1 A1 B1 C1 D1
2 K2 A2 B2 C2 D2
3 K3 A3 B3 C3 D3
DataFrames Join in Python Pandas
In this example, both left
and right
DataFrames are indexed with keys ‘K0’, ‘K1’, ‘K2’, and ‘K3’. The left.join(right)
operation merges the DataFrames on their indices, combining columns ‘A’ and ‘B’ from left
with columns ‘C’ and ‘D’ from right
based on the shared index.
Python3
left = pd.DataFrame({ 'A' : [ 'A0' , 'A1' , 'A2' , 'A3' ], 'B' : [ 'B0' , 'B1' , 'B2' , 'B3' ]}, index = [ 'K0' , 'K1' , 'K2' , 'K3' ]) right = pd.DataFrame({ 'C' : [ 'C0' , 'C1' , 'C2' , 'C3' ], 'D' : [ 'D0' , 'D1' , 'D2' , 'D3' ]}, index = [ 'K0' , 'K1' , 'K2' , 'K3' ]) # Joining the dataframes left.join(right) |
Output
A B C D
K0 A0 B0 C0 D0
K1 A1 B1 C1 D1
K2 A2 B2 C2 D2
K3 A3 B3 C3 D3