How to use spark.sql() In Python

Here we will use SQL query to change the column type.

Syntax: spark.sql(“sql Query”)

Example: Using spark.sql()

Python




# course_df5 has all the column datatypes as string
course_df5.createOrReplaceTempView("course_view")
  
course_df7 = spark.sql('''
SELECT 
  Name,
  Course_Name,
  INT(Duration_Months),
  FLOAT(Course_Fees),
  DATE(Start_Date),
  BOOLEAN(Payment_Done)
FROM course_view
''')
  
course_df7.printSchema()


Output:

root
 |-- Name: string (nullable = true)
 |-- Course_Name: string (nullable = true)
 |-- Duration_Months: integer (nullable = true)
 |-- Course_Fees: float (nullable = true)
 |-- Start_Date: date (nullable = true)
 |-- Payment_Done: boolean (nullable = true)


How to Change Column Type in PySpark Dataframe ?

In this article, we are going to see how to change the column type of pyspark dataframe.

Creating dataframe for demonstration:

Python




# Create a spark session
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName('SparkExamples').getOrCreate()
  
# Create a spark dataframe
columns = ["Name", "Course_Name",
           "Duration_Months",
           "Course_Fees", "Start_Date",
           "Payment_Done"]
data = [
    ("Amit Pathak", "Python", 3,
     10000, "02-07-2021", True),
    ("Shikhar Mishra", "Soft skills",
     2, 8000, "07-10-2021", False),
    ("Shivani Suvarna", "Accounting",
     6, 15000, "20-08-2021", True),
    ("Pooja Jain", "Data Science", 12,
     60000, "02-12-2021", False),
]
course_df = spark.createDataFrame(data).toDF(*columns)
  
# View the dataframe
course_df.show()


Output:

Let’s see the schema of dataframe:

Python




# View the column datatypes
course_df.printSchema()


Output:

Similar Reads

Method 1: Using DataFrame.withColumn()

...

Method 2: Using DataFrame.select()

...

Method 3: Using spark.sql()

The DataFrame.withColumn(colName, col) returns a new DataFrame by adding a column or replacing the existing column that has the same name....