Hyperparameter Tuning

Consider using cross-validation to assess your model’s robustness:

Python

from catboost import CatBoostClassifier, Pool, cv
 
# Create a CatBoost Pool
catboost_pool = Pool(X, label=y)
 
# Define the parameters for the CatBoost model
params = {
    'iterations': 1000,
    'learning_rate': 0.01,
    'depth': 3,
    'loss_function': 'MultiClass',
    'random_state': 42,
}
 
# Perform cross-validation using the cv function from CatBoost
cv_results, cv_model = cv(
    pool=catboost_pool,
    params=params,
    # Specify the number of folds for cross-validation
    fold_count=5,  
    # Print information during training
    verbose=False, 
    return_models=True
)

Output:

Training on fold [0/5]

bestTest = 0.1903599557
bestIteration = 723

Training on fold [1/5]

bestTest = 0.2019080832
bestIteration = 540

Training on fold [2/5]

bestTest = 0.09307095973
bestIteration = 983

Training on fold [3/5]

bestTest = 0.1257137299
bestIteration = 893

Training on fold [4/5]

bestTest = 0.09728240085
bestIteration = 996

Print the Result:

Python3

print(cv_results.head())

Output:

   iterations  test-MultiClass-mean  test-MultiClass-std  \
0           0              1.086702             0.001203   
1           1              1.074234             0.001518   
2           2              1.060712             0.001777   
3           3              1.050879             0.002378   
4           4              1.039454             0.001931   

   train-MultiClass-mean  train-MultiClass-std  
0               1.086469              0.000294  
1               1.074242              0.001409  
2               1.060602              0.001765  
3               1.050635              0.001235  
4               1.039139              0.001284

The code applies cross-validation to a CatBoostClassifier model using the CatBoost library. It begins by constructing a CatBoost Pool, a data structure that manages the dataset effectively. The depth of the trees, learning rate, loss function (set to “MultiClass” for multiclass classification), and a random seed for repeatability are among the parameters for the CatBoost model that are specified. The cv function from CatBoost is used to carry out the cross-validation. In order to print training data, it specifies the cross-validation fold count (fold_count=5) and asks for verbose output. After cross-validation, the code pulls the names of the metrics from the results and chooses the relevant metric (in this example, the first metric on the list) to compute the mean loss. As a result, the mean loss expressed as a percentage is printed. The CatBoost model’s performance is assessed via cross-validation with the aid of this code, which also offers information on the model’s average loss over various folds.

Python3

# Check the available metric names in the cross-validation results
available_metrics = [metric for metric in cv_results.columns 
                     if metric.startswith('test-')]
print("Available Metrics:", available_metrics)
 
# Choose the appropriate metric for mean accuracy and extract it
# You may need to choose the correct metric based on your task
mean_loss = cv_results[available_metrics[0]].iloc[-1]  
 
print(f"Mean Loss: {mean_loss * 100:.2f}%")

Output:

Available Metrics: ['test-MultiClass-mean', 'test-MultiClass-std']
Mean Loss: 14.60%

Evaluate the accuracy for the each model

Let’s evaluate the accuracy of each model using the obtained model from the each fold

Python3

def Accuracy_Score(cv_model,y_test):
    score ={}
    for i, model in enumerate(cv_model):
        # Make predictions on the test data
        y_pred = model.predict(X_test,
                                     prediction_type='Class')
        # Calculate accuracy
        accuracy = accuracy_score(y_test, y_pred)
        score[i+1]=str(accuracy * 100)+'%'
         
    return score
         
Accuracy_Score(cv_model,y_test)

Output:

{1: '100.0%', 2: '100.0%', 3: '100.0%', 4: '100.0%', 5: '100.0%'}

Further Improvements

To further enhance your model, consider:

Feature engineering to create informative features.
Exploring different evaluation metrics for classification tasks, especially if dealing with imbalanced data.
Regularization techniques to prevent overfitting.
Visualizations to gain insights into your data and model’s predictions.

CatBoost Parameters and Hyperparameters

For gradient boosting on decision trees, CatBoost is a well-liked open-source toolkit. It was created by Yandex and may be applied to a range of machine-learning issues, including classification, regression, ranking, and more. Compared to other boosting libraries, CatBoost has a number of benefits, including:

It can handle categorical features automatically, without the need for encoding or preprocessing.
It can reduce overfitting by using a novel gradient-boosting scheme and regularization techniques.
It can achieve high performance and scalability by using efficient implementations for CPU and GPU.

In this post, we will concentrate on the CatBoost parameters and hyperparameters, which are the variables that regulate the algorithm’s operation and performance. We will describe them, how they impact the model, and how to fine-tune them for the best outcomes.

Tags:

#CatBoost #Geeks Premier League 2023 #AI-ML-DS #Geeks Premier League #Machine Learning #Machine Learning

Implementation

Conclusion

Hyperparameter Tuning

Python

Print the Result:

Python3

Python3

Evaluate the accuracy for the each model

Python3

Further Improvements

CatBoost Parameters and Hyperparameters

Similar Reads