Compare Two Csv Files Using Python
We are given two files and our tasks is to compare two CSV files based on their differences in Python. In this article, we will see some generally used methods for comparing two CSV files and print differences.
Compare Two CSV Files for Differences in Python
Below are some of the ways by which we can compare two CSV files for differences in Python:
- Using Pandas library
- Using CSV module
file1.csv
Name,Age,City
John,25,New York
Emily,30,Los Angeles
Michael,40,Chicago
file2.csv
Name,Age,City
John,25,New York
Michael,45,Chicago
Emma,35,San Francisco
Compare Two CSV Files Using Pandas library
In this approach, the Python Program loads both the CSV files (‘file1.csv’ & ‘file2.csv’) into two DataFrames. Once the CSV files are loaded, the compare() method provided by Pandas allows us to efficiently identify differences between the two DataFrames by comparing each corresponding row between the two DataFrames.
import pandas as pd
# Read CSV files
df1 = pd.read_csv('file1.csv')
df2 = pd.read_csv('file2.csv')
# Compare dataframes
diff = df1.compare(df2)
# Print the differences
print("Differences between file1 and file2:")
print(diff)
Output
Differences between file1 and file2:
Name Age City
self other self other self other
1 Emily Michael 30.0 45.0 Los Angeles Chicago
2 Michael Emma 40.0 35.0 Chicago San Francisco
Compare Two CSV Files Using CSV Module
In this approach, the Python Program reads both the CSV files (‘file1.csv’ & ‘file2.csv’) using csv.reader function in reading mode. Then iterate over the rows of both CSV files and compare them.
import csv
# Function to compare two CSV files
def compare(file1, file2):
differences = []
# Open both CSV files in read mode
with open(file1, 'r') as csv_file1, open(file2, 'r') as csv_file2:
reader1 = csv.reader(csv_file1)
reader2 = csv.reader(csv_file2)
# Iterate over rows in both files simultaneously
for row1, row2 in zip(reader1, reader2):
if row1 != row2:
differences.append((row1, row2))
return differences
# Define file paths
file1 = 'file1.csv'
file2 = 'file2.csv'
# Call the compare_csv_files function and store the differences
differences = compare(file1, file2)
for diff in differences:
print(f"Difference found: {diff}")
Output
Difference found: (['Emily', '30', 'Los Angeles'], ['Michael', '45', 'Chicago']) Difference found: (['Michael', '40', 'Chicago'], ['Emma', '35', 'San Francisco'])