Regression Model using LightGBM

Now one last thing that is remaining is to define some parameters that we must have to pass for the training process of the model and the arguments that will be used for the same.

Python3




# Define a dictionary of parameters for configuring the LightGBM regression model.
params = {
    'objective': 'regression',
    'metric': 'rmse',
    'boosting_type': 'gbdt',
    'num_leaves': 31,
    'learning_rate': 0.05,
    'feature_fraction': 0.9,
}


Let’s take a quick look at the parameters that has been passed to the model.

  • objective – This defines on what type of task you are going to train your model on for e.g regression has been passed here for regression task also we can pass classification and multi-class classification for binary as well as multi class classification.
  • metric – Metric that will be used by the model to improve upon. Also we will be able to get the model’s performance on the validation data(if passed) as the training process goes on.
  • boosting_type – It is the method that is been used by the lightgbm model to train the parameters of the model for e.g GBDT(Gradient Boosting Decision Trees) that is the default method and rf(random forest based) and one more is dart(Dropouts meet Multiple Additive Regression Trees).
  • num_leaves – The default value is 31 and it is used to define the maximum number of leaf nodes in a tree.
  • learning_rate – As we know that the learning rate is a very common hyperparameter that is used to control the learning process.
  • feature_fraction – This is the fraction of the features that will be used initially to train the decision trees. If we set this to 0.9 that means 90% of the features will be used only. this help us deal with the problem of overfitting.

Let’s train the model for 100 epoch on the training data and we will pass the validation data as well to visualize the performance of the model on the unseen data while training process goes on. This helps us to keep a check on the training progress.

Python3




# Set the number of rounds and train the model with early stopping
num_round = 100
bst = lgb.train(params, train_data, num_round, valid_sets=[
                test_data], early_stopping_rounds=10)


Output:

[70]    valid_0's rmse: 0.387332
[71] valid_0's rmse: 0.387193
[72] valid_0's rmse: 0.387407
[73] valid_0's rmse: 0.387696
[74] valid_0's rmse: 0.388172
[75] valid_0's rmse: 0.388142
[76] valid_0's rmse: 0.388688
Early stopping, best iteration is:
[66] valid_0's rmse: 0.386691

This code snippet trains a LightGBM model using the supplied parameters (params) and training data (train_data) and sets the number of boosting rounds (num_round). Early stopping is used, where the model keeps track of how it performs on the test data validation dataset and terminates training if no improvement is seen after 10 rounds. This ensures that the model finishes training when it reaches its optimal performance and prevents overfitting. Here we can observe that the Root Mean Square Error value for the validation data is 0.386691 that is a very good score for a regression metric.

Prediction and Evaluation of Model

Python3




# Import necessary libraries for calculating mean squared error and using the LightGBM regressor.
from sklearn.metrics import mean_squared_error as mse
from lightgbm import LGBMRegressor
  
# Create an instance of the LightGBM Regressor with the RMSE metric.
model = LGBMRegressor(metric='rmse')
  
# Train the model using the training data.
model.fit(X_train, Y_train)
  
# Make predictions on the training and validation data.
y_train = model.predict(X_train)
y_val = model.predict(X_val)


Here, it utilizes the lightGBM library for regression modelling. The model is trained on the provided training data and predictions are made on both training and validation datasets.

Validation of the Model

Python3




# Calculate and print the Root Mean Squared Error (RMSE) for training and validation predictions.
print("Training RMSE: ", np.sqrt(mse(Y_train, y_train)))
print("Validation RMSE: ", np.sqrt(mse(Y_val, y_val)))


Output:

Training RMSE:  0.2331835443343122
Validation RMSE: 0.40587871797893876

Here, this code computes and displayed the RMSE, a measure of prediction accuracy, for both the training and validation datasets. It assesses how well lightGBM regression model performs on the data, with lower RMSE values indicating better model fit.

Regression using LightGBM

In this article, we will learn about one of the state-of-the-art machine learning models: Lightgbm or light gradient boosting machine. After improvising more and more on the XGB model for better performance XGBoost which is an eXtreme Gradient Boosting machine but by the lightgbm we can achieve similar or better results without much computing and train our model on an even bigger dataset in less time. Let’s see what is LightGBM and how we can perform regression using LightGBM.

Table of Content

  • What is LightGBM?
  • How LightGBM Works?
  • Implementation of LightBGM
  • Exploratory Data Analysis
  • Data Preprocessing
  • Regression Model using LightGBM
  • Conclusion

Similar Reads

What is LightGBM?

...

How LightGBM Works?

LightGBM or ‘Light Gradient Boosting Machine’, is an open source, high-performance gradient boosting framework designed for efficient and scalable machine learning tasks. It is specially tailored for speed and accuracy, making it a popular choice for both structured and unstructured data in diverse domains....

Implementation of LightBGM

LightGBM creates a decision tree that develops leaf-wise, which implies that given a condition, just one leaf is split, depending on the benefit. Sometimes, especially with smaller datasets, leaf-wise trees might overfit. Overfitting can be prevented by limiting the tree depth. A histogram of the distribution is used by LightGBM to bucket data into bins. Instead of using every data point, the bins are used to iterate, calculate the gain, and divide the data. Additionally, a sparse dataset can benefit from this method’s optimization. Exclusive feature bundling, which refers to the algorithm’s combining of exclusive features to reduce dimensionality reduction and speed up processing, is another element of LightGBM....

Exploratory Data Analysis

In this article, we will use this dataset to perform a regression task using the lightGBM algorithm. But to use the LightGBM model we will first have to install the lightGBM model using the below command:...

Data Preprocessing

...

Regression Model using LightGBM

...

Conclusion

...