Dataframe Slicing and Observation
A. Observation
We can view top 5 rows with head() methods
# Print first 5 rows
print(df.head())
Output:
FRUITS QUANTITY PRICE
0 Mango 40 80
1 Apple 20 100
2 Banana 25 50
3 Orange 10 70
We can view the top last 5 rows with tail() methods.
# Print Last 5 rows
print(df.tail())
Output:
FRUITS QUANTITY PRICE
0 Mango 40 80
1 Apple 20 100
2 Banana 25 50
3 Orange 10 70
sample() methods return the ith number of rows.
# Randomly select n rows
print(df.sample(3))
Output:
FRUITS QUANTITY PRICE
2 Banana 25 50
0 Mango 40 80
1 Apple 20 100
# Select top 2 Highest QUANTITY
print(df.nlargest(2, 'QUANTITY'))
Output:
FRUITS QUANTITY PRICE
0 Mango 40 80
2 Banana 25 50
# Select Least 2 QUANTITY
print(df.nsmallest(2, 'QUANTITY'))
Output:
FRUITS QUANTITY PRICE
3 Orange 10 70
1 Apple 20 100
# Select the price > 50
print(df[df.PRICE > 50])
Output:
FRUITS QUANTITY PRICE
0 Mango 40 80
1 Apple 20 100
3 Orange 10 70
B. Select Column data
# Select the FRUITS name
print(df['FRUITS'])
Output:
0 Mango
1 Apple
2 Banana
3 Orange
Name: FRUITS, dtype: object
# Select the FRUITS name and
# their corresponding PRICE
print(df[['FRUITS', 'PRICE']])
Output:
FRUITS PRICE
0 Mango 80
1 Apple 100
2 Banana 50
3 Orange 70
# Select the columns whose names match
# the regular expression
print(df.filter(regex='F|Q'))
Output:
FRUITS QUANTITY
0 Mango 40
1 Apple 20
2 Banana 25
3 Orange 10
C. Subsets of rows or columns
# Select all the columns between Fruits and Price
print(df.loc[:, 'FRUITS':'PRICE'])
Output:
FRUITS QUANTITY PRICE
0 Mango 40 80
1 Apple 20 100
2 Banana 25 50
3 Orange 10 70
# Select FRUITS name having PRICE <70
print(df.loc[df['PRICE'] < 70,
['FRUITS', 'PRICE']])
Output:
FRUITS PRICE
2 Banana 50
# Select 2:5 rows
print(df.iloc[2:5])
Output:
FRUITS QUANTITY PRICE
2 Banana 25 50
3 Orange 10 70
# Select the columns having ) 0th & 2nd positions
print(df.iloc[:, [0, 2]])
Output:
FRUITS PRICE
0 Mango 80
1 Apple 100
2 Banana 50
3 Orange 70
For more please refer to this article Indexing and Selecting data
Dataframe
FRUITS | QUANTITY | PRICE | |
---|---|---|---|
0 | Mango | 40 | 80 |
1 | Apple | 20 | 100 |
2 | Banana | 25 | 50 |
3 | Orange | 10 | 70 |
# Select Single PRICE value at 2nd Postion
df.at[1, 'PRICE']
Output:
100
# Select the single values by their position
df.iat[1, 2]
Output:
100
Filter
Filter by column name
print(df.filter(items=['FRUITS', 'PRICE']))
Output:
FRUITS PRICE
0 Mango 80
1 Apple 100
2 Banana 50
3 Orange 70
Filter by row index
# Filter by row index
print(df.filter(items=[3], axis=0))
Output:
FRUITS QUANTITY PRICE
3 Orange 10 70
Where
df['PRICE'].where(df['PRICE'] > 50)
Output:
0 80.0
1 100.0
2 NaN
3 70.0
4 60.0
5 NaN
Name: PRICE, dtype: float64
Query
Pandas query() methods return the filtered data frame.
# QUERY
print(df.query('PRICE>70'))
Output:
FRUITS QUANTITY PRICE
0 Mango 40 80
1 Apple 20 100
# Price >50 & QUANTITY <30
print(df.query('PRICE>50 and QUANTITY<30'))
Output:
FRUITS QUANTITY PRICE
1 Apple 20 100
3 Orange 10 70
# FRUITS name start with 'M'
print(df.query("FRUITS.str.startswith('M')", ))
Output:
FRUITS QUANTITY PRICE
0 Mango 40 80
Pandas Cheat Sheet for Data Science in Python
Pandas is a powerful and versatile library that allows you to work with data in Python. It offers a range of features and functions that make data analysis fast, easy, and efficient. Whether you are a data scientist, analyst, or engineer, Pandas can help you handle large datasets, perform complex operations, and visualize your results.
This Pandas Cheat Sheet is designed to help you master the basics of Pandas and boost your data skills. It covers the most common and useful commands and methods that you need to know when working with data in Python. You will learn how to create, manipulate, and explore data frames, how to apply various functions and calculations, how to deal with missing values and duplicates, how to merge and reshape data, and much more.
If you are new to Data Science using Python and Pandas, or if you want to refresh your memory, this cheat sheet is a handy reference that you can use anytime. It will save you time and effort by providing you with clear and concise examples of how to use Pandas effectively.