How to use withColumn() method. In Python

Here we are using withColumn() method to select the columns.

Syntax: dataframe.withColumn(“string_column”, when(col(“column”)==’value’, 1)).otherwise(value))

Where

  • dataframe is the pyspark dataframe
  • string_column is the column to be mapped to numeric
  • value is the numeric value

Example: Here we are going to create a college spark dataframe using Row method and  map college name with college number using with column method along with when().

Python3




# import col and when modules
from pyspark.sql.functions import col, when
 
# map college name with college number
# using with column method along with when module
dataframe.withColumn("college_number",
                     when(col("college")=='iit', 1)
                     .when(col("college")=='vignan', 2)
                     .when(col("college")=='rvrjc', 3)
                     .otherwise(4)).show()


Output:



Pyspark Dataframe – Map Strings to Numeric

In this article, we are going to see how to convert map strings to numeric.

Creating dataframe for demonstration:

Here we are creating a row of data for college names and then pass the createdataframe() method and then we are displaying the dataframe.

Python3




# importing module
import pyspark
 
# importing sparksession from pyspark.sql module and Row module
from pyspark.sql import SparkSession,Row
 
# creating sparksession and giving an app name
spark = SparkSession.builder.appName('sparkdf').getOrCreate()
 
# list  of college data
dataframe = spark.createDataFrame([Row("vignan"),
                                   Row("rvrjc"),
                                   Row("klu"),
                                   Row("rvrjc"),
                                   Row("klu"),
                                   Row("vignan"),
                                   Row("iit")],
                                  ["college"])
 
# display dataframe
dataframe.show()


Output:

Similar Reads

Method 1: Using map() function

...

Method 2: Using withColumn() method.

Here we created a function to convert string to numeric through a lambda expression...