PySpark DataFrame – Select all except one or a set of columns

Q: What is PySpark DataFrame – Select all except one or a set of columns?

In this article, we will learn PySpark DataFrame – Select all except one or a set of columns,This free Python tutorial for complete beginners will help you learn Python from scratch.

Select all columns, except one given column in a Pandas DataFrame

In this article, we are going to extract all columns except a set of columns or one column from Pyspark dataframe. For this, we will use the select(), drop() functions.

But first, let’s create Dataframe for demonestration.

Python3

# importing module 
import pyspark 
  
# importing sparksession from pyspark.sql module 
from pyspark.sql import SparkSession 
  
# creating sparksession and giving an app name 
spark = SparkSession.builder.appName('sparkdf').getOrCreate() 
  
# list  of students  data 
data = [["1", "sravan", "vignan"], 
        ["2", "ojaswi", "vvit"], 
        ["3", "rohith", "vvit"], 
        ["4", "sridevi", "vignan"], 
        ["1", "sravan", "vignan"], 
        ["5", "gnanesh", "iit"]] 
  
# specify column names 
columns = ['student ID', 'student NAME', 'college'] 
  
# creating a dataframe from the lists of data 
dataframe = spark.createDataFrame(data, columns) 
  
print('Actual data in dataframe') 
dataframe.show() 

Output:

Method 1: Using drop() function

drop() is used to drop the columns from the dataframe.

Syntax: dataframe.drop(‘column_names’)

Where dataframe is the input dataframe and column names are the columns to be dropped

Example: Python program to select data by dropping one column

Python3

# drop student id 
dataframe.drop('student ID').show() 

Output:

Example 2: Python program to drop more than one column(set of columns)

Python3

# drop student id and college 
dataframe.drop('student ID','college').show() 

Output:

Method 2: Using select() function

This function is used to select the columns from the dataframe

Syntax: dataframe.select(columns)

Where dataframe is the input dataframe and columns are the input columns

Example 1: Select one column from the dataframe.

Python3

# select student id  
dataframe.select('student ID').show() 

Output:

Example 2: Python program to select two columns id and name

Python3

# select student id and student name 
dataframe.select('student ID','student NAME').show() 

Output:

Tags:

#Python-Pyspark #Python #python

Select all columns, except one given column in a Pandas DataFrame