Model Development

Now as we are completely ready with the data part it’s preprocessing and splitting into training and the testing data. Now we will import catboostregressor from teh catboost module and train it on our dataset.

Python3

# CatBoost Regression Model 
from catboost import CatBoostRegressor 
  
# Initialize the CatBoostRegressor with RMSE as the loss function 
model = CatBoostRegressor(loss_function='RMSE') 
  
# Fit the model on the training data with verbose logging every 100 iterations 
model.fit(X_train, Y_train, verbose=100) 

Output:

Learning rate set to 0.051037
0:    learn: 0.8976462    total: 47.1ms    remaining: 47s
100:    learn: 0.3741647    total: 123ms    remaining: 1.09s
200:    learn: 0.3571139    total: 199ms    remaining: 792ms
300:    learn: 0.3455686    total: 275ms    remaining: 638ms
400:    learn: 0.3369937    total: 352ms    remaining: 526ms
500:    learn: 0.3305270    total: 430ms    remaining: 428ms
600:    learn: 0.3252100    total: 513ms    remaining: 340ms
700:    learn: 0.3200064    total: 623ms    remaining: 266ms
800:    learn: 0.3153692    total: 698ms    remaining: 173ms
900:    learn: 0.3116973    total: 773ms    remaining: 84.9ms
999:    learn: 0.3082544    total: 847ms    remaining: 0us
<catboost.core.CatBoostRegressor at 0x7fad65983730>

As we can see that the training has been done for around 1000 epochs and now we can use the training and validation data to analyze the performance of the model.

Let’s understand this code in detail:

‘CatBosstRegressor‘ is a python class provided by the catboost library for creating regression models. It is specifically designed for regression tasks, where the code is to predict a continuous numeric target variable on input features.

Here, in the code ‘CatBoostRegressor(loss function=’RMSE’) initializes catboost regression model with the Root Mean Squared Error(RMSE) as the loss function. The model aims to minimize the error during training.

‘Model.fit()’ method is used to train a model on the given dataset. In this model, it is applied to CatboostRegressor model. Here,

X_train: Feature matrix containing independent variables used for training.
Y_train: The target variable, which is the actual values the model aims to predict.
verbose=100: The verbose parameter controls the level of output displayed during training. In this code, ‘verbose=100’ specifies that the training process should provide verbose output, printing progress information every 100 iterations.

Together, these tools make it possible to build and train a CatBoost regression model with RMSE as the loss function. The model is trained using the supplied training data (X_train and Y_train), with verbose logging turned on to track the training status. The model can be used to make predictions on new data after training is finished.

Prediction

Python3

# Import the mean squared error (MSE) function from sklearn and alias it as 'mse' 
from sklearn.metrics import mean_squared_error as mse 
  
# Generate predictions on the training and validation sets using the trained 'model' 
y_train = model.predict(X_train) 
y_val = model.predict(X_val) 
  
# Calculate and print the Root Mean Squared Error (RMSE) for training and validation sets 
print("Training RMSE: ", np.sqrt(mse(Y_train, y_train))) 
print("Validation RMSE: ", np.sqrt(mse(Y_val, y_val))) 

Output:

Training RMSE:  0.308254377636178
Validation RMSE:  0.39986332453193907

Above we have seen that before passing the data to the model we have converted the categorical features to the numerical or one hot encoded one. But while we are using the catboost model we can choose not to perform this operation explicitly.

Regression using CatBoost

In this article, we will learn about one of the state-of-the-art machine learning models: Catboost here cat stands for categorical which implies that this algorithm is highly efficient when your data contains many categorical columns.

Table of Content

What is CatBoost?
How Catboost Works?
Implementation of Regression Using CatBoost
Exploratory Data Analysis
Data Preprocessing
Model Development

Model Development

Python3

Prediction

Python3

Regression using CatBoost

Table of Content

Similar Reads