Model Development

Now as we are completely ready with the data part it’s preprocessing and splitting into training and the testing data. Now we will import catboostregressor from teh catboost module and train it on our dataset.

Python3




# CatBoost Regression Model
from catboost import CatBoostRegressor
  
# Initialize the CatBoostRegressor with RMSE as the loss function
model = CatBoostRegressor(loss_function='RMSE')
  
# Fit the model on the training data with verbose logging every 100 iterations
model.fit(X_train, Y_train, verbose=100)


Output:

Learning rate set to 0.051037
0: learn: 0.8976462 total: 47.1ms remaining: 47s
100: learn: 0.3741647 total: 123ms remaining: 1.09s
200: learn: 0.3571139 total: 199ms remaining: 792ms
300: learn: 0.3455686 total: 275ms remaining: 638ms
400: learn: 0.3369937 total: 352ms remaining: 526ms
500: learn: 0.3305270 total: 430ms remaining: 428ms
600: learn: 0.3252100 total: 513ms remaining: 340ms
700: learn: 0.3200064 total: 623ms remaining: 266ms
800: learn: 0.3153692 total: 698ms remaining: 173ms
900: learn: 0.3116973 total: 773ms remaining: 84.9ms
999: learn: 0.3082544 total: 847ms remaining: 0us
<catboost.core.CatBoostRegressor at 0x7fad65983730>

As we can see that the training has been done for around 1000 epochs and now we can use the training and validation data to analyze the performance of the model.

Let’s understand this code in detail:

CatBosstRegressor‘ is a python class provided by the catboost library for creating regression models. It is specifically designed for regression tasks, where the code is to predict a continuous numeric target variable on input features.

Here, in the code ‘CatBoostRegressor(loss function=’RMSE’) initializes catboost regression model with the Root Mean Squared Error(RMSE) as the loss function. The model aims to minimize the error during training.

‘Model.fit()’ method is used to train a model on the given dataset. In this model, it is applied to CatboostRegressor model. Here,

X_train: Feature matrix containing independent variables used for training.
Y_train: The target variable, which is the actual values the model aims to predict.
verbose=100: The verbose parameter controls the level of output displayed during training. In this code, ‘verbose=100’ specifies that the training process should provide verbose output, printing progress information every 100 iterations.

Together, these tools make it possible to build and train a CatBoost regression model with RMSE as the loss function. The model is trained using the supplied training data (X_train and Y_train), with verbose logging turned on to track the training status. The model can be used to make predictions on new data after training is finished.

Prediction

Python3




# Import the mean squared error (MSE) function from sklearn and alias it as 'mse'
from sklearn.metrics import mean_squared_error as mse
  
# Generate predictions on the training and validation sets using the trained 'model'
y_train = model.predict(X_train)
y_val = model.predict(X_val)
  
# Calculate and print the Root Mean Squared Error (RMSE) for training and validation sets
print("Training RMSE: ", np.sqrt(mse(Y_train, y_train)))
print("Validation RMSE: ", np.sqrt(mse(Y_val, y_val)))


Output:

Training RMSE:  0.308254377636178
Validation RMSE: 0.39986332453193907

Above we have seen that before passing the data to the model we have converted the categorical features to the numerical or one hot encoded one. But while we are using the catboost model we can choose not to perform this operation explicitly.

Regression using CatBoost

In this article, we will learn about one of the state-of-the-art machine learning models: Catboost here cat stands for categorical which implies that this algorithm is highly efficient when your data contains many categorical columns.

Table of Content

  • What is CatBoost?
  • How Catboost Works?
  • Implementation of Regression Using CatBoost
  • Exploratory Data Analysis
  • Data Preprocessing
  • Model Development

Similar Reads

What is CatBoost?

...

How Catboost Works?

CatBoost, (Categorical Boosting), is a high-performance, open-source, gradient-boosting framework developed by Yandex. It is designed for solving a wide range of machine learning tasks, including classification, regression, and ranking, with a particular emphasis on handling categorical features efficiently. Catboost stands out for its speed, accuracy, and ease of use in dealing with structured data....

Implementation of Regression Using CatBoost

Catboost is a high-performance gradient-boosting technique made for machine learning tasks, especially in situations involving structured input. Gradient boosting, an ensemble learning technique, forms the basis of its main workings. Catboost begins by speculating, frequently the mean of the target variable. The ensemble of decision trees is then gradually built, with each tree seeking to eliminate the errors or residuals from the previous ones. Catboost stands out because of how well it handles category features. Catboost uses a method termed “ordered boosting” to process categorical data directly, resulting in faster training and better model performance....

Exploratory Data Analysis

We will use this dataset to perform a regression task using the catboost algorithm. But to use the catboost model we will first have to install the catboost package model using the below command:...

Data Preprocessing

...

Model Development

...

Conclusion

...