How to use list comprehension In Python

This will act as a loop to get each row and finally we can use for loop to get particular columns, we are going to iterate the data in the given column using the collect() method through rdd.

Syntax: dataframe.rdd.collect()

Example: Here we are going to iterate rows in NAME column.

Python3




# importing module
import pyspark
 
# importing sparksession from pyspark.sql module
from pyspark.sql import SparkSession
 
# creating sparksession and giving an app name
spark = SparkSession.builder.appName('sparkdf').getOrCreate()
 
# list  of employee data
data = [["1", "sravan", "company 1"],
        ["2", "ojaswi", "company 1"],
        ["3", "rohith", "company 2"],
        ["4", "sridevi", "company 1"],
        ["5", "bobby", "company 1"]]
 
# specify column names
columns = ['ID', 'NAME', 'Company']
 
# creating a dataframe from the lists of data
dataframe = spark.createDataFrame(data, columns)
 
# select name column
for i in [j["NAME"] for j in dataframe.rdd.collect()]:
    print(i)


Output:

sravan
ojaswi
rohith
sridevi
bobby

How to Iterate over rows and columns in PySpark dataframe

In this article, we will discuss how to iterate rows and columns in PySpark dataframe.

Create the dataframe for demonstration:

Python3




# importing module
import pyspark
 
# importing sparksession from pyspark.sql module
from pyspark.sql import SparkSession
 
# creating sparksession and giving an app name
spark = SparkSession.builder.appName('sparkdf').getOrCreate()
 
# list  of employee data
data = [["1", "sravan", "company 1"],
        ["2", "ojaswi", "company 1"],
        ["3", "rohith", "company 2"],
        ["4", "sridevi", "company 1"],
        ["5", "bobby", "company 1"]]
 
# specify column names
columns = ['ID', 'NAME', 'Company']
 
# creating a dataframe from the lists of data
dataframe = spark.createDataFrame(data, columns)
 
dataframe.show()


Output:

Similar Reads

Method 1: Using collect()

...

Method 2: Using toLocalIterator()

This method will collect all the rows and columns of the dataframe and then loop through it using for loop. Here an iterator is used to iterate over a loop from the collected elements using the collect() method....

Method 3: Using iterrows()

...

Method 4: Using select()

It will return the iterator that contains all rows and columns in RDD. It is similar to the collect() method, But it is in rdd format, so it is available inside the rdd method. We can use the toLocalIterator() with rdd like:...

Method 5: Using list comprehension

...

Method 6: Using map()

This will iterate rows. Before that, we have to convert our PySpark dataframe into Pandas dataframe using toPandas() method. This method is used to iterate row by row in the dataframe....