How to use spark.sql() In Python

Q: How to use spark.sql() In Python

The DataFrame.withColumn(colName, col) returns a new DataFrame by adding a column or replacing the existing column that has the same name.

Method 2: Using DataFrame.select()

Here we will use SQL query to change the column type.

Syntax: spark.sql(“sql Query”)

Example: Using spark.sql()

Python

# course_df5 has all the column datatypes as string 
course_df5.createOrReplaceTempView("course_view") 
  
course_df7 = spark.sql(''' 
SELECT  
  Name, 
  Course_Name, 
  INT(Duration_Months), 
  FLOAT(Course_Fees), 
  DATE(Start_Date), 
  BOOLEAN(Payment_Done) 
FROM course_view 
''') 
  
course_df7.printSchema()

Output:

root
 |-- Name: string (nullable = true)
 |-- Course_Name: string (nullable = true)
 |-- Duration_Months: integer (nullable = true)
 |-- Course_Fees: float (nullable = true)
 |-- Start_Date: date (nullable = true)
 |-- Payment_Done: boolean (nullable = true)

How to Change Column Type in PySpark Dataframe ?

In this article, we are going to see how to change the column type of pyspark dataframe.

Creating dataframe for demonstration:

Python

# Create a spark session 
from pyspark.sql import SparkSession 
spark = SparkSession.builder.appName('SparkExamples').getOrCreate() 
  
# Create a spark dataframe 
columns = ["Name", "Course_Name", 
           "Duration_Months", 
           "Course_Fees", "Start_Date", 
           "Payment_Done"] 
data = [ 
    ("Amit Pathak", "Python", 3, 
     10000, "02-07-2021", True), 
    ("Shikhar Mishra", "Soft skills", 
     2, 8000, "07-10-2021", False), 
    ("Shivani Suvarna", "Accounting", 
     6, 15000, "20-08-2021", True), 
    ("Pooja Jain", "Data Science", 12, 
     60000, "02-12-2021", False), 
] 
course_df = spark.createDataFrame(data).toDF(*columns) 
  
# View the dataframe 
course_df.show() 

Output:

Let’s see the schema of dataframe:

Python

# View the column datatypes 
course_df.printSchema()

Output:

Tags:

#Python-Pyspark #Python #python

Method 2: Using DataFrame.select()

How to use spark.sql() In Python

Python

How to Change Column Type in PySpark Dataframe ?

Python

Python

Similar Reads