Use createDataFrame() method and use toPandas() method

Here is the syntax of the createDataFrame() method :

Syntax : current_session.createDataFrame(data, schema=None, samplingRatio=None, verifySchema=True)

Parameters :

data : a resilient distributed dataset or data in form of MySQL/SQL datatypes

schema : string or list of columns names for the DataFrame.

samplingRatio -> float: a sample ratio of the rows

verifySchema -> bool: check if the datatypes of the rows is as specified in the schema

Returns : PySpark DataFrame object.

Example:

In this example, we will pass the Row list as data and create a PySpark DataFrame. We will then use the toPandas() method to get a Pandas DataFrame.

Python

# Importing PySpark and importantly
# Row from pyspark.sql
import pyspark
from pyspark.sql import SparkSession
from pyspark.sql import Row
 
# PySpark Session
row_pandas_session = SparkSession.builder.appName(
    'row_pandas_session'
).getOrCreate()
 
# List of Sample Row objects
row_object_list = [Row(Topic='Dynamic Programming', Difficulty=10),
                   Row(Topic='Arrays', Difficulty=5),
                   Row(Topic='Sorting', Difficulty=6),
                   Row(Topic='Binary Search', Difficulty=7)]
 
# creating PySpark DataFrame using createDataFrame()
df = row_pandas_session.createDataFrame(row_object_list)
 
# Printing the Spark DataFrame
df.show()
 
# Conversion to Pandas DataFrame
pandas_df = df.toPandas()
 
# Final Result
print(pandas_df)

Output :

Convert PySpark Row List to Pandas DataFrame

In this article, we will convert a PySpark Row List to Pandas Data Frame. A Row object is defined as a single Row in a PySpark DataFrame. Thus, a Data Frame can be easily represented as a Python List of Row objects.

Tags:

#Python-Pyspark #Python #python

Method 2 : Using parallelize()

Use createDataFrame() method and use toPandas() method

Python

Convert PySpark Row List to Pandas DataFrame

Similar Reads