Data Cleaning
Data Cleaning is the way to improvise the data or remove incorrect, corrupted or irrelevant data.
As in our dataset, there are some columns that are not important and irrelevant for the model training. So, we can drop that column before training. There are 2 approaches to dealing with empty/null values
- We can easily delete the column/row (if the feature or record is not much important).
- Filling the empty slots with mean/mode/0/NA/etc. (depending on the dataset requirement).
As Id Column will not be participating in any prediction. So we can Drop it.
Python3
dataset.drop([ 'Id' ], axis = 1 , inplace = True ) |
Replacing SalePrice empty values with their mean values to make the data distribution symmetric.
Python3
dataset[ 'SalePrice' ] = dataset[ 'SalePrice' ].fillna( dataset[ 'SalePrice' ].mean()) |
Drop records with null values (as the empty records are very less).
Python3
new_dataset = dataset.dropna() |
Checking features which have null values in the new dataframe (if there are still any).
Python3
new_dataset.isnull(). sum () |
Output:
House Price Prediction using Machine Learning in Python
We all have experienced a time when we have to look up for a new house to buy. But then the journey begins with a lot of frauds, negotiating deals, researching the local areas and so on.