How To Calculate Average For Every Column In A Csv File
We are given a CSV file and our task is to find the average of each column in Python using different approaches. In this article, we will see how we can calculate the average for every column in a CSV file.
Example:
Input: data.csv Age,Salary 30,50000 25,60000 28,55000 Output: Average Age: 27.67, Average Salary: 55000.00
Calculate the Average For Every Column in a Python CSV file
Below are some of the ways by which we can calculate the average for every column in a Python CSV file:
- Using CSV module and Manual Calculation
- Using Pandas Library
- Using NumPy Library
data.csv
Age,Salary
30,50000
25,60000
28,55000
Using the csv and Manual Calculation
In this example, the Python program reads a CSV file (‘data.csv’), calculates the sum and count of numeric values in each column, and then computes and prints the average for each column, handling non-numeric values gracefully. The results are displayed with two decimal places.
Python3
# Python program for the above approach import csv # Open the CSV file for reading with open ( 'data.csv' , newline = '') as csvfile: reader = csv.reader(csvfile) headers = next (reader) # Read the header row # Initialize variables to store column sums and counts sums = [ 0 ] * len (headers) counts = [ 0 ] * len (headers) # Iterate through each row in the CSV file for row in reader: for i, value in enumerate (row): try : num = float (value) sums[i] + = num counts[i] + = 1 except ValueError: pass # Ignore non-numeric values # Calculate and print the average for each column for i, header in enumerate (headers): average = sums[i] / counts[i] if counts[i] ! = 0 else 0 print (f "Average {header}: {average:.2f}" ) |
Output:
Average Age: 27.67
Average Salary: 55000.00
Using Pandas Library
In this example, the Python script uses the Pandas library to read a CSV file (‘data.csv’) into a DataFrame. It then calculates the average for each column using the mean()
function and displays the results, providing a concise and efficient approach for calculating column averages in a CSV dataset.
Python3
import csv import pandas as pd # Read the CSV file (replace 'data.csv' with your file path) df = pd.read_csv( 'data.csv' ) # Calculate column averages column_averages = df.mean() # Display the results print ( "Average for each column:" ) print (column_averages) |
Output:
Average for each column:
Age 27.666667
Salary 55000.000000
dtype: float64
Using NumPy Library
In this example, the Python script utilizes the NumPy library to read a CSV file (‘data.csv’) and convert it into a NumPy array of integers, skipping the header row. It then calculates the average for specific columns (Age and Salary) using np.mean()
and displays the results with two decimal places. This approach provides a concise method for computing column averages in a CSV dataset with numerical data.
Python3
import numpy as np import csv # Read the CSV file (replace 'data.csv' with your file path) with open ( 'data.csv' , 'r' ) as f: reader = csv.reader(f) next (reader) # Skip the header row data = np.array( list (reader), dtype = int ) # Calculate column averages age_avg = np.mean(data[:, 0 ]) # Column 0 (Age) salary_avg = np.mean(data[:, 1 ]) # Column 1 (Salary) # Display the results print (f "Average Age: {age_avg:.2f}" ) print (f "Average Salary: ${salary_avg:.2f}" ) |
Output:
Average Age: 27.67
Average Salary: $55000.00