How to use collect() In Python
This method will collect all the rows and columns of the dataframe and then loop through it using for loop. Here an iterator is used to iterate over a loop from the collected elements using the collect() method.
Syntax:
for itertator in dataframe.collect(): print(itertator["column_name"],...............)
where,
- dataframe is the input dataframe
- iterator is used to collect rows
- column_name is the column to iterate rows
Example: Here we are going to iterate all the columns in the dataframe with collect() method and inside the for loop, we are specifying iterator[‘column_name’] to get column values.
Python3
# importing module import pyspark # importing sparksession from pyspark.sql module from pyspark.sql import SparkSession # creating sparksession and giving an app name spark = SparkSession.builder.appName( 'sparkdf' ).getOrCreate() # list of employee data data = [[ "1" , "sravan" , "company 1" ], [ "2" , "ojaswi" , "company 1" ], [ "3" , "rohith" , "company 2" ], [ "4" , "sridevi" , "company 1" ], [ "5" , "bobby" , "company 1" ]] # specify column names columns = [ 'ID' , 'NAME' , 'Company' ] # creating a dataframe from the lists of data dataframe = spark.createDataFrame(data, columns) # using collect for i in dataframe.collect(): # display print (i[ "ID" ], i[ "NAME" ], i[ "Company" ]) |
Output:
How to Iterate over rows and columns in PySpark dataframe
In this article, we will discuss how to iterate rows and columns in PySpark dataframe.
Create the dataframe for demonstration:
Python3
# importing module import pyspark # importing sparksession from pyspark.sql module from pyspark.sql import SparkSession # creating sparksession and giving an app name spark = SparkSession.builder.appName( 'sparkdf' ).getOrCreate() # list of employee data data = [[ "1" , "sravan" , "company 1" ], [ "2" , "ojaswi" , "company 1" ], [ "3" , "rohith" , "company 2" ], [ "4" , "sridevi" , "company 1" ], [ "5" , "bobby" , "company 1" ]] # specify column names columns = [ 'ID' , 'NAME' , 'Company' ] # creating a dataframe from the lists of data dataframe = spark.createDataFrame(data, columns) dataframe.show() |
Output: