Ways to convert Pandas Columns to List
Pandas is a Python package that offers various data structures and operations for manipulating numerical data and time series. Pandas is mostly used for data manipulation, one such technique is converting a pandas data frame column into lists.
We can convert Pandas data frame columns to lists using various methods.
Importance of Converting Columns to List
Converting Pandas columns to lists is particularly useful in various scenarios, adding flexibility and compatibility to data manipulation tasks in Python. Some key reasons why this conversion is important include:
- Iterative Data Processing: Lists in Python are mutable, allowing for easy modification of data. When performing iterative data processing or implementing custom algorithms, working with lists can be more convenient than Pandas Series or DataFrames.
- Compatibility with Legacy Code: In scenarios where legacy code or external scripts are designed to handle standard Python lists, converting Pandas columns ensures compatibility.
- Ease of Data Sharing: Sharing data with collaborators or stakeholders who are not familiar with Pandas is simplified when data is presented in the familiar format of lists.
Python Implementation
Before converting a Pandas column to a list, let’s create a DataFrame for better understanding.
Here we are creating a dataset consisting of the Name, Age, Salary, and Gender of 5 members and converting the dataset into a data frame using the pandas DataFrame() function.
Python3
import pandas as pd data = { 'Name' : [ 'Alice' , 'Bob' , 'Charlie' , 'David' , 'Eva' ], 'Age' : [ 28 , 34 , 22 , 45 , 31 ], 'Salary' : [ 60000 , 75000 , 50000 , 90000 , 65000 ], 'Gender' : [ 'Female' , 'Male' , 'Male' , 'Male' , 'Female' ], } df = pd.DataFrame(data) print (df) |
Output:
Name Age Salary Gender
0 Alice 28 60000 Female
1 Bob 34 75000 Male
2 Charlie 22 50000 Male
3 David 45 90000 Male
4 Eva 31 65000 Female
Let’s implement ways for conversion:
Using Series.values.tolist()
We can convert pandas columns to lists using the pandas tolist() function. Here, we are converting the Name column into a list using Series.values.tolist() function. Series represents the column of the data frame, values bring the NumPy array of the series, and to list () function converts the NumPy array to a list.
Python3
# Using Series.values.tolist() name_list = df[ 'Name' ].values.tolist() print ( "Name list:" , name_list) |
Output:
Name list: ['Alice', 'Bob', 'Charlie', 'David', 'Eva']
Alternatively, we can use the name of the column instead of square braces.
Python3
# Using Series.values.tolist() name_list1 = df.Name.values.tolist() print ( "Name list1:" , name_list1) |
Output:
Name list1: ['Alice', 'Bob', 'Charlie', 'David', 'Eva']
Using list() Function
We can use the list() function to convert the pandas column to list. We just need to pass the column of the data frame to the list function. Here, we are converting the age column into the list by passing the age column to the list() function.
Python3
# Using list() Function age_list = list (df[ 'Age' ]) print ( "Age list:" , age_list) |
Output:
Age list: [28, 34, 22, 45, 31]
Get List by Column Index
We can convert the pandas columns to list using column indexes. Here we are passing 3rd column to the tolist() function and it convert the 3rd column to list.
Python3
# Get List by Column Index gender_list = df[df.columns[ 3 ]].values.tolist() print ( "Gender list:" , gender_list) |
Output:
Gender list: ['Female', 'Male', 'Male', 'Male', 'Female']
Convert the Index Column to a List
We can convert index column to list using ‘DataFrame.index.tolist()’ function. Here, we are converting index column to list.
Python3
# Convert the Index Column to a List index_list = df.index.tolist() print ( "Index list:" , index_list) |
Output:
Index list: [0, 1, 2, 3, 4]
Convert Columns to Numpy Array
Sometimes we have to convert the Pandas columns to NumPy arrays, We can do it by using ‘.to_numpy()’ function. Here, we are crating Salary column to numpy array and printing it.
Python3
# Convert Columns to Numpy Array salary_array = df[ 'Salary' ].to_numpy() print ( "Salary array:" , salary_array) |
Output:
Salary array: [60000 75000 50000 90000 65000]