How to use spark.sql() In Python
Here we will use SQL query to change the column type.
Syntax: spark.sql(“sql Query”)
Example: Using spark.sql()
Python
# course_df5 has all the column datatypes as string course_df5.createOrReplaceTempView( "course_view" ) course_df7 = spark.sql( ''' SELECT Name, Course_Name, INT(Duration_Months), FLOAT(Course_Fees), DATE(Start_Date), BOOLEAN(Payment_Done) FROM course_view ''' ) course_df7.printSchema() |
Output:
root |-- Name: string (nullable = true) |-- Course_Name: string (nullable = true) |-- Duration_Months: integer (nullable = true) |-- Course_Fees: float (nullable = true) |-- Start_Date: date (nullable = true) |-- Payment_Done: boolean (nullable = true)
How to Change Column Type in PySpark Dataframe ?
In this article, we are going to see how to change the column type of pyspark dataframe.
Creating dataframe for demonstration:
Python
# Create a spark session from pyspark.sql import SparkSession spark = SparkSession.builder.appName( 'SparkExamples' ).getOrCreate() # Create a spark dataframe columns = [ "Name" , "Course_Name" , "Duration_Months" , "Course_Fees" , "Start_Date" , "Payment_Done" ] data = [ ( "Amit Pathak" , "Python" , 3 , 10000 , "02-07-2021" , True ), ( "Shikhar Mishra" , "Soft skills" , 2 , 8000 , "07-10-2021" , False ), ( "Shivani Suvarna" , "Accounting" , 6 , 15000 , "20-08-2021" , True ), ( "Pooja Jain" , "Data Science" , 12 , 60000 , "02-12-2021" , False ), ] course_df = spark.createDataFrame(data).toDF( * columns) # View the dataframe course_df.show() |
Output:
Let’s see the schema of dataframe:
Python
# View the column datatypes course_df.printSchema() |
Output: