How to use select() In Python

Q: How to use select() In Python

It will return the iterator that contains all rows and columns in RDD. It is similar to the collect() method, But it is in rdd format, so it is available inside the rdd method. We can use the toLocalIterator() with rdd like:

Method 3: Using iterrows()

Method 5: Using list comprehension

The select() function is used to select the number of columns. we are then using the collect() function to get the rows through for loop.

The select method will select the columns which are mentioned and get the row data using collect() method. This method will collect rows from the given columns.

Syntax: dataframe.select(“column1″,…………,”column n”).collect()

Example: Here we are going to select ID and Name columns from the given dataframe using the select() method

Python3

# importing module
import pyspark
 
# importing sparksession from pyspark.sql module
from pyspark.sql import SparkSession
 
# creating sparksession and giving an app name
spark = SparkSession.builder.appName('sparkdf').getOrCreate()
 
# list  of employee data
data = [["1", "sravan", "company 1"],
        ["2", "ojaswi", "company 1"],
        ["3", "rohith", "company 2"],
        ["4", "sridevi", "company 1"],
        ["5", "bobby", "company 1"]]
 
# specify column names
columns = ['ID', 'NAME', 'Company']
 
# creating a dataframe from the lists of data
dataframe = spark.createDataFrame(data, columns)
 
# select only id and company
for rows in dataframe.select("ID", "Name").collect():
       # display
    print(rows[0], rows[1])

Output:

How to Iterate over rows and columns in PySpark dataframe

In this article, we will discuss how to iterate rows and columns in PySpark dataframe.

Create the dataframe for demonstration:

Python3

# importing module
import pyspark
 
# importing sparksession from pyspark.sql module
from pyspark.sql import SparkSession
 
# creating sparksession and giving an app name
spark = SparkSession.builder.appName('sparkdf').getOrCreate()
 
# list  of employee data
data = [["1", "sravan", "company 1"],
        ["2", "ojaswi", "company 1"],
        ["3", "rohith", "company 2"],
        ["4", "sridevi", "company 1"],
        ["5", "bobby", "company 1"]]
 
# specify column names
columns = ['ID', 'NAME', 'Company']
 
# creating a dataframe from the lists of data
dataframe = spark.createDataFrame(data, columns)
 
dataframe.show()