How to use map() In Python

This function is used to map the given dataframe column to list

Syntax: dataframe.select(‘Column_Name’).rdd.map(lambda x : x[0]).collect()

where,

  • dataframe is the pyspark dataframe
  • Column_Name is the column to be converted into the list
  • map() is the method available in rdd which takes a lambda expression as a parameter and converts the column into list
  • collect() is used to collect the data in the columns

Example: Python code to convert pyspark dataframe column to list using the map function.

Python3




# convert  student Name  to list using map
print(dataframe.select('student Name').
      rdd.map(lambda x : x[0]).collect())
 
# convert  student ID  to list using map
print(dataframe.select('student ID').
      rdd.map(lambda x : x[0]).collect())
 
# convert  student college  to list using
# map
print(dataframe.select('college').
      rdd.map(lambda x : x[0]).collect())


Output:

[‘sravan’, ‘ojaswi’, ‘rohith’, ‘sridevi’, ‘sravan’, ‘gnanesh’]

[‘1’, ‘2’, ‘3’, ‘4’, ‘1’, ‘5’]

[‘vignan’, ‘vvit’, ‘vvit’, ‘vignan’, ‘vignan’, ‘iit’]

Converting a PySpark DataFrame Column to a Python List

In this article, we will discuss how to convert Pyspark dataframe column to a Python list.

Creating dataframe for demonstration:

Python3




# importing module
import pyspark
 
# importing sparksession from pyspark.sql module
from pyspark.sql import SparkSession
 
# creating sparksession and giving an app name
spark = SparkSession.builder.appName('sparkdf').getOrCreate()
 
# list  of students  data
data = [["1", "sravan", "vignan", 67, 89],
        ["2", "ojaswi", "vvit", 78, 89],
        ["3", "rohith", "vvit", 100, 80],
        ["4", "sridevi", "vignan", 78, 80],
        ["1", "sravan", "vignan", 89, 98],
        ["5", "gnanesh", "iit", 94, 98]]
 
# specify column names
columns = ['student ID', 'student NAME',
           'college', 'subject1', 'subject2']
 
# creating a dataframe from the lists of data
dataframe = spark.createDataFrame(data, columns)
 
# display dataframe
dataframe.show()


Output:

Similar Reads

Method 1: Using flatMap()

...

Method 2: Using map()

This method takes the selected column as the input which uses rdd and converts it into the list....

Method 3: Using collect()

...

Method 4: Using toLocalIterator()

...

Method 5: Using toPandas()

This function is used to map the given dataframe column to list...