What is Data Cleaning?

In this article, we will learn Data Cleaning,This free Python tutorial for complete beginners will help you learn Python from scratch.

How to use Data Cleaning in Python?

Now, we categorize the features depending on their datatype (int, float, object) and then calculate the number of them.

Data Cleaning

Exploratory Data Analysis

OneHotEncoder – For Label categorical features

Data Cleaning is the way to improvise the data or remove incorrect, corrupted or irrelevant data.

As in our dataset, there are some columns that are not important and irrelevant for the model training. So, we can drop that column before training. There are 2 approaches to dealing with empty/null values

We can easily delete the column/row (if the feature or record is not much important).
Filling the empty slots with mean/mode/0/NA/etc. (depending on the dataset requirement).

As Id Column will not be participating in any prediction. So we can Drop it.

Python3

dataset.drop(['Id'],
             axis=1,
             inplace=True)

Replacing SalePrice empty values with their mean values to make the data distribution symmetric.

Python3

dataset['SalePrice'] = dataset['SalePrice'].fillna(
  dataset['SalePrice'].mean())

Drop records with null values (as the empty records are very less).

Python3

new_dataset = dataset.dropna()

Checking features which have null values in the new dataframe (if there are still any).

Python3

new_dataset.isnull().sum()

Output:

House Price Prediction using Machine Learning in Python

We all have experienced a time when we have to look up for a new house to buy. But then the journey begins with a lot of frauds, negotiating deals, researching the local areas and so on.