Data Cleaning

Data Cleaning is the way to improvise the data or remove incorrect, corrupted or irrelevant data.

As in our dataset, there are some columns that are not important and irrelevant for the model training. So, we can drop that column before training. There are 2 approaches to dealing with empty/null values

  • We can easily delete the column/row (if the feature or record is not much important).
  • Filling the empty slots with mean/mode/0/NA/etc. (depending on the dataset requirement).

As Id Column will not be participating in any prediction. So we can Drop it.

Python3




dataset.drop(['Id'],
             axis=1,
             inplace=True)


Replacing SalePrice empty values with their mean values to make the data distribution symmetric.

Python3




dataset['SalePrice'] = dataset['SalePrice'].fillna(
  dataset['SalePrice'].mean())


Drop records with null values (as the empty records are very less).

Python3




new_dataset = dataset.dropna()


Checking features which have null values in the new dataframe (if there are still any).

Python3




new_dataset.isnull().sum()


Output:

 

House Price Prediction using Machine Learning in Python

We all have experienced a time when we have to look up for a new house to buy. But then the journey begins with a lot of frauds, negotiating deals, researching the local areas and so on.

Similar Reads

House Price Prediction using Machine Learning

So to deal with this kind of issues Today we will be preparing a MACHINE LEARNING Based model, trained on the House Price Prediction Dataset....

Importing Libraries and Dataset

Here we are using...

Data Preprocessing

...

Exploratory Data Analysis

...

Data Cleaning

Now, we categorize the features depending on their datatype (int, float, object) and then calculate the number of them....

OneHotEncoder – For Label categorical features

...

Splitting Dataset into Training and Testing

EDA refers to the deep analysis of data so as to discover different patterns and spot anomalies. Before making inferences from data it is essential to examine all your variables....

Model and Accuracy

...

Conclusion

...