Read CSV File into DataFrame

Read Multiple CSV Files

Here we are going to read a single CSV into dataframe using spark.read.csv and then create dataframe with this data using .toPandas().

Python3

from pyspark.sql import SparkSession
 
spark = SparkSession.builder.appName(
    'Read CSV File into DataFrame').getOrCreate()
 
authors = spark.read.csv('/content/authors.csv', sep=',',
                         inferSchema=True, header=True)
 
df = authors.toPandas()
df.head()

Output:

Here, we passed our CSV file authors.csv. Second, we passed the delimiter used in the CSV file. Here the delimiter is comma ‘,‘. Next, we set the inferSchema attribute as True, this will go through the CSV file and automatically adapt its schema into PySpark Dataframe. Then, we converted the PySpark Dataframe to Pandas Dataframe df using toPandas() method.

PySpark – Read CSV file into DataFrame

In this article, we are going to see how to read CSV files into Dataframe. For this, we will use Pyspark and Python.

Files Used:

authors
book_author
books

Tags:

#Blogathon-2021 #Python-Pyspark #Blogathon #Python #python

Read Multiple CSV Files

Read CSV File into DataFrame

Python3

PySpark – Read CSV file into DataFrame

Similar Reads