Visualizing LightGBM Feature Importance

First, make sure you have LightGBM installed:

! pip install lightgbm

Let’s break down the provided code step by step:

Step 1: Import Libraries

In this step, we import the necessary libraries that the code will use:

lightgbm for building the gradiant boosting framework
matplotlib.pyplot for creating plots
sklearn.datasets to import breast cancer dataset for classification
train_test_split, numpy and pandas to perform data pre processing

Python3

#Importing Necessary Libraries 
import pandas as pd
import numpy as np
import lightgbm as lgb
import matplotlib.pyplot as plt
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split

Step 2: Create a LightGBM Dataset

Here, a LightGBM dataset named train_data is created. This dataset is specifically formatted for training the LightGBM model. It is constructed using the following inputs:

X_train: This variable is assumed to contain the training feature data (i.e., the independent variables).
y_train: This variable is assumed to contain the corresponding target labels (i.e., the dependent variable or the values you want to predict).

Python3

# Loading the Breast Cancer Dataset 
cancer = load_breast_cancer()
 
# Creating dataframe 
df = pd.DataFrame(np.c_[cancer['data'], cancer['target']], columns = np.append(cancer['feature_names'], ['target']))
## Features 
X = df.drop(['target'], axis =1)
## Target 
y = df['target']
 
# Splitting the dataset in test and train datasets 
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state = 0)
 
# Creating the dataframe 
train_data = lgb.Dataset(X_train, label = y_train)

Step 3: Define Model Parameters

In this step, a dictionary named `params` is defined. This dictionary holds various configuration parameters that will be used to set up the LightGBM model. Here’s what each parameter means:

objective specifies the objective of the model
metric specifies the evaluation metric that the model should optimize during training
boosting_type indicates the boosting type to be used in LightGBM. gbdt stands for Gradient Boosting Decision Trees, one of the boosting methods available in LightGBM.

These parameters define how the model will be trained and evaluated.

Python

# Define parameters for the model
params = {
    "objective": "binary",
    "metric": "binary_logloss",
    "boosting_type": "gbdt",
    "learning_rate" : 0.1
}

Step 4: Train the LightGBM Model

In this step, the LightGBM model is trained using the lgb.train function. Here’s what’s happening:

params is the model configuration parameters defined earlier are passed as the first argument.
train_data is LightGBM training dataset is provided as the second argument.
num_boost_round=5 specifies the number of boosting rounds or iterations during training. The model is trained for 5 rounds, and each round involves adding a decision tree to the ensemble.

After this step, the model variable contains the trained LightGBM model.

Python3

# Train the LightGBM model
model = lgb.train(params, train_data, num_boost_round=5)

Output:

[LightGBM] [Info] Number of positive: 249, number of negative: 149
[LightGBM] [Warning] Auto-choosing col-wise multi-threading, the overhead of testing was 0.000248 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 3978
[LightGBM] [Info] Number of data points in the train set: 398, number of used features: 30
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.625628 -> initscore=0.513507
[LightGBM] [Info] Start training from score 0.513507
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf

Step 5: Plot Feature Importance

Finally, the code visualizes the feature importance using the lgb.plot_importance function and Matplotlib. Here’s what each part of this step does:

lgb.plot_importance(model, importance_type=”gain”, figsize=(7,6), title=”LightGBM Feature Importance (Gain)”) generates a feature importance plot based on the trained LightGBM model. It specifies the importance type as “gain,” which calculates feature importance based on the gain in accuracy achieved by using each feature for splitting in the decision trees. It also sets the figure size and provides a title for the plot.
lgb.plot_importance(model, importance_type=”split”, figsize=(7, 6), title=”LightGBM Feature Importance (Split)”) creates a feature importance plot based on the ‘split’ metric. This metric measures how often a feature is used to split the data in decision trees during training, which helps assess the feature’s importance in making decisions.

Plot feature importance using Gain

Python3

# Plot feature importance using Gain
lgb.plot_importance(model, importance_type="gain", figsize=(7,6), title="LightGBM Feature Importance (Gain)")
plt.show()

Output:

Gain Feature Importance Graph

Plot feature importance using Gain

Python3

# Plot feature importance using Split
lgb.plot_importance(model, importance_type="split", figsize=(7,6), title="LightGBM Feature Importance (Split)")
plt.show()

Output:

Split Feature Importance Graph

The resulting plot provides insights into which features were most influential in the LightGBM model’s predictions, helping in feature selection and model interpretation.

The code demonstrates the complete process of importing libraries, preparing a LightGBM dataset, defining model parameters, training a LightGBM regression model, and visualizing feature importance using the “gain” method.

LightGBM Feature Importance and Visualization

When it comes to machine learning, model performance depends heavily on feature selection and understanding the significance of each feature. LightGBM, an efficient gradient-boosting framework developed by Microsoft, has gained popularity for its speed and accuracy in handling various machine-learning tasks. LightGBM, with its remarkable speed and memory efficiency, finds practical application in a multitude of fields. Its ability to handle large-scale data processing efficiently makes it indispensable in industries like finance, e-commerce, and healthcare, where massive datasets require swift analysis.

Tags:

#Geeks Premier League 2023 #LightGBM #AI-ML-DS #Geeks Premier League #Machine Learning #Machine Learning

Feature Importance

Advantages of Using LightGBM

Visualizing LightGBM Feature Importance

Step 1: Import Libraries

Python3

Step 2: Create a LightGBM Dataset

Python3

Step 3: Define Model Parameters

Python

Step 4: Train the LightGBM Model

Python3

Step 5: Plot Feature Importance

Plot feature importance using Gain

Python3

Plot feature importance using Gain

Python3

LightGBM Feature Importance and Visualization

Similar Reads