How to use collect() In Python

Method 2: Using toLocalIterator()

This method will collect all the rows and columns of the dataframe and then loop through it using for loop. Here an iterator is used to iterate over a loop from the collected elements using the collect() method.

Syntax:

for itertator in dataframe.collect():
                    print(itertator["column_name"],...............)

where,

dataframe is the input dataframe
iterator is used to collect rows
column_name is the column to iterate rows

Example: Here we are going to iterate all the columns in the dataframe with collect() method and inside the for loop, we are specifying iterator[‘column_name’] to get column values.

Python3

# importing module
import pyspark
 
# importing sparksession from pyspark.sql module
from pyspark.sql import SparkSession
 
# creating sparksession and giving an app name
spark = SparkSession.builder.appName('sparkdf').getOrCreate()
 
# list  of employee data
data = [["1", "sravan", "company 1"],
        ["2", "ojaswi", "company 1"],
        ["3", "rohith", "company 2"],
        ["4", "sridevi", "company 1"],
        ["5", "bobby", "company 1"]]
 
# specify column names
columns = ['ID', 'NAME', 'Company']
 
# creating a dataframe from the lists of data
dataframe = spark.createDataFrame(data, columns)
 
# using collect
for i in dataframe.collect():
    # display
    print(i["ID"], i["NAME"], i["Company"])

Output:

How to Iterate over rows and columns in PySpark dataframe

In this article, we will discuss how to iterate rows and columns in PySpark dataframe.

Create the dataframe for demonstration:

Python3

# importing module
import pyspark
 
# importing sparksession from pyspark.sql module
from pyspark.sql import SparkSession
 
# creating sparksession and giving an app name
spark = SparkSession.builder.appName('sparkdf').getOrCreate()
 
# list  of employee data
data = [["1", "sravan", "company 1"],
        ["2", "ojaswi", "company 1"],
        ["3", "rohith", "company 2"],
        ["4", "sridevi", "company 1"],
        ["5", "bobby", "company 1"]]
 
# specify column names
columns = ['ID', 'NAME', 'Company']
 
# creating a dataframe from the lists of data
dataframe = spark.createDataFrame(data, columns)
 
dataframe.show()

Output:

Tags:

#Python-Pyspark #Python #python

Method 2: Using toLocalIterator()

How to use collect() In Python

Python3

How to Iterate over rows and columns in PySpark dataframe

Python3

Similar Reads