How to Rename Multiple PySpark DataFrame Columns
In this article, we will discuss how to rename the multiple columns in PySpark Dataframe. For this we will use withColumnRenamed() and toDF() functions.
Creating Dataframe for demonstration:
Python3
# importing module import pyspark # importing sparksession from pyspark.sql module from pyspark.sql import SparkSession # creating sparksession and giving an app name spark = SparkSession.builder.appName( 'sparkdf' ).getOrCreate() # list of students data with null values # we can define null values with none data = [[ None , "sravan" , "vignan" ], [ "2" , None , "vvit" ], [ "3" , "rohith" , None ], [ "4" , "sridevi" , "vignan" ], [ "1" , None , None ], [ "5" , "gnanesh" , "iit" ]] # specify column names columns = [ 'ID' , 'NAME' , 'college' ] # creating a dataframe from the lists of data dataframe = spark.createDataFrame(data, columns) # show columns print (dataframe.columns) # display dataframe dataframe.show() |
Output:
Method 1: Using withColumnRenamed()
This method is used to rename a column in the dataframe
Syntax: dataframe.withColumnRenamed(“old_column_name”, “new_column_name”)
where
- dataframe is the pyspark dataframe
- old_column_name is the existing column name
- new_column_name is the new column name
To change multiple columns, we can specify the functions for n times, separated by “.” operator
Syntax: dataframe.withColumnRenamed(“old_column_name”, “new_column_name”).
withColumnRenamed”old_column_name”, “new_column_name”)
Example 1: Python program to change the column name for two columns
Python3
# display actual columns print ( "Actual columns: " , dataframe.columns) # change the college column name to university # and ID to student_id dataframe = dataframe.withColumnRenamed( "college" , "university" ).withColumnRenamed( "ID" , "student_id" ) # display modified columns print ( "modified columns: " , dataframe.columns) # final dataframe dataframe.show() |
Output:
Example 2: Rename all columns
Python3
# display actual columns print ( "Actual columns: " , dataframe.columns) # change the college column name to university # and ID to student_id dataframe = dataframe.withColumnRenamed( "college" , "university" ).withColumnRenamed( "ID" , "student_id" ).withColumnRenamed( "NAME" , "student_name" ) # display modified columns print ( "modified columns: " , dataframe.columns) # final dataframe dataframe.show() |
Output:
Method 2: Using toDF()
This method is used to change the names of all the columns of the dataframe
Syntax: dataframe.toDF(*(“column 1″,”column 2”,”column n))
where, columns are the columns in the dataframe
Example: Python program to change the column names
Python3
# display actual print ( "Actual columns: " , dataframe.columns) # change column names to A,B,C dataframe = dataframe.toDF( * ( "A" , "B" , "C" )) # display new columns print ( "New columns: " , dataframe.columns) # display dataframe dataframe.show() |
Output: