Missing Data Handing

Descriptive Statistical Measures of a DataFrame

Find missing values in the dataset

The isnull( ) detects the missing values and returns a boolean object indicating if the values are NA. The values which are none or empty get mapped to true values and not null values get mapped to false values.

Python3

data_frame.isnull( )

Output:

     CustomerID  Genre    Age  Annual Income (k$)  Spending Score (1-100)
0         False  False  False               False                   False
1         False  False  False               False                   False
2         False  False  False               False                   False
3         False  False  False               False                   False
4         False  False  False               False                   False
..          ...    ...    ...                 ...                     ...
195       False  False  False               False                   False
196       False  False  False               False                   False
197       False  False  False               False                   False
198       False  False  False               False                   False
199       False  False  False               False                   False
[200 rows x 5 columns]
[8]
0s

Find the number of missing values in the dataset

To find out the number of missing values in the dataset, use data_frame.isnull( ).sum( ). In the below example, the dataset doesn’t contain any null values. Hence, each column’s output is 0.

Python3

data_frame.isnull().sum()

Output:

CustomerID                0
Genre                     0
Age                       0
Annual Income (k$)        0
Spending Score (1-100)    0
dtype: int64

Removing missing values

The data_frame.dropna( ) function removes columns or rows which contains atleast one missing values.

data_frame = data_frame.dropna()

By default, data_frame.dropna( ) drops the rows where at least one element is missing. data_frame.dropna(axis = 1) drops the columns where at least one element is missing.

Fill in missing values

We can fill null values using data_frame.fillna( ) function.

 data_frame = data_frame.fillna(value)

But by using the above format all the null values will get filled with the same values. To fill different values in the different columns we can use.

data_frame[col] = data_frame[col].fillna(value)

Row and column manipulations

Removing rows

By using the drop(index) function we can drop the row at a particular index. If we want to replace the data_frame with the row removed then add inplace = True in the drop function.

Python3

#Removing 4th indexed value from the dataframe
data_frame.drop(4).head()

Output:

   CustomerID   Genre  Age  Annual Income (k$)  Spending Score (1-100)
0           1    Male   19                  15                      39
1           2    Male   21                  15                      81
2           3  Female   20                  16                       6
3           4  Female   23                  16                      77
5           6  Female   22                  17                      76
[ ]

This function can also be used to remove the columns of a data frame by adding the attribute axis =1 and providing the list of columns we would like to remove.

Renaming rows

The rename function can be used to rename the rows or columns of the data frame.

Python3

data_frame.rename({0:"First",1:"Second"})

Output:

        CustomerID   Genre  Age  Annual Income (k$)  Spending Score (1-100)
First            1    Male   19                  15                      39
Second           2    Male   21                  15                      81
2                3  Female   20                  16                       6
3                4  Female   23                  16                      77
4                5  Female   31                  17                      40
...            ...     ...  ...                 ...                     ...
195            196  Female   35                 120                      79
196            197  Female   45                 126                      28
197            198    Male   32                 126                      74
198            199    Male   32                 137                      18
199            200    Male   30                 137                      83
[200 rows x 5 columns]

Adding new columns

Python3

#Creates a new column with all the values equal to 1
data_frame['NewColumn'] = 1
data_frame.head()

Output:

   CustomerID   Genre  Age  Annual Income (k$)  Spending Score (1-100)  \
0           1    Male   19                  15                      39   
1           2    Male   21                  15                      81   
2           3  Female   20                  16                       6   
3           4  Female   23                  16                      77   
4           5  Female   31                  17                      40   
   NewColumn  
0          1  
1          1  
2          1  
3          1  
4          1

Data Processing with Pandas

Data Processing is an important part of any task that includes data-driven work. It helps us to provide meaningful insights from the data. As we know Python is a widely used programming language, and there are various libraries and tools available for data processing.

In this article, we are going to see Data Processing in Python, Loading, Printing rows and Columns, Data frame summary, Missing data values Sorting and Merging Data Frames, Applying Functions, and Visualizing Dataframes.

Table of Content

What is Data Processing in Python?
What is Pandas?
Loading Data in Pandas DataFrame
Printing rows of the Data
Printing the column names of the DataFrame
Summary of Data Frame
Descriptive Statistical Measures of a DataFrame
Missing Data Handing
Sorting DataFrame values
Merge Data Frames
Apply Function
By using the lambda operator
Visualizing DataFrame
Conclusion

Missing Data Handing

Find missing values in the dataset

Python3

Find the number of missing values in the dataset

Python3

Removing missing values

Fill in missing values

Row and column manipulations

Removing rows

Python3

Renaming rows

Python3

Adding new columns

Python3

Data Processing with Pandas

Similar Reads