Hyperparameter Tuning
Consider using cross-validation to assess your model’s robustness:
Python
from catboost import CatBoostClassifier, Pool, cv # Create a CatBoost Pool catboost_pool = Pool(X, label = y) # Define the parameters for the CatBoost model params = { 'iterations' : 1000 , 'learning_rate' : 0.01 , 'depth' : 3 , 'loss_function' : 'MultiClass' , 'random_state' : 42 , } # Perform cross-validation using the cv function from CatBoost cv_results, cv_model = cv( pool = catboost_pool, params = params, # Specify the number of folds for cross-validation fold_count = 5 , # Print information during training verbose = False , return_models = True ) |
Output:
Training on fold [0/5]
bestTest = 0.1903599557
bestIteration = 723
Training on fold [1/5]
bestTest = 0.2019080832
bestIteration = 540
Training on fold [2/5]
bestTest = 0.09307095973
bestIteration = 983
Training on fold [3/5]
bestTest = 0.1257137299
bestIteration = 893
Training on fold [4/5]
bestTest = 0.09728240085
bestIteration = 996
Print the Result:
Python3
print (cv_results.head()) |
Output:
iterations test-MultiClass-mean test-MultiClass-std \
0 0 1.086702 0.001203
1 1 1.074234 0.001518
2 2 1.060712 0.001777
3 3 1.050879 0.002378
4 4 1.039454 0.001931
train-MultiClass-mean train-MultiClass-std
0 1.086469 0.000294
1 1.074242 0.001409
2 1.060602 0.001765
3 1.050635 0.001235
4 1.039139 0.001284
The code applies cross-validation to a CatBoostClassifier model using the CatBoost library. It begins by constructing a CatBoost Pool, a data structure that manages the dataset effectively. The depth of the trees, learning rate, loss function (set to “MultiClass” for multiclass classification), and a random seed for repeatability are among the parameters for the CatBoost model that are specified. The cv function from CatBoost is used to carry out the cross-validation. In order to print training data, it specifies the cross-validation fold count (fold_count=5) and asks for verbose output. After cross-validation, the code pulls the names of the metrics from the results and chooses the relevant metric (in this example, the first metric on the list) to compute the mean loss. As a result, the mean loss expressed as a percentage is printed. The CatBoost model’s performance is assessed via cross-validation with the aid of this code, which also offers information on the model’s average loss over various folds.
Python3
# Check the available metric names in the cross-validation results available_metrics = [metric for metric in cv_results.columns if metric.startswith( 'test-' )] print ( "Available Metrics:" , available_metrics) # Choose the appropriate metric for mean accuracy and extract it # You may need to choose the correct metric based on your task mean_loss = cv_results[available_metrics[ 0 ]].iloc[ - 1 ] print (f "Mean Loss: {mean_loss * 100:.2f}%" ) |
Output:
Available Metrics: ['test-MultiClass-mean', 'test-MultiClass-std']
Mean Loss: 14.60%
Evaluate the accuracy for the each model
Let’s evaluate the accuracy of each model using the obtained model from the each fold
Python3
def Accuracy_Score(cv_model,y_test): score = {} for i, model in enumerate (cv_model): # Make predictions on the test data y_pred = model.predict(X_test, prediction_type = 'Class' ) # Calculate accuracy accuracy = accuracy_score(y_test, y_pred) score[i + 1 ] = str (accuracy * 100 ) + '%' return score Accuracy_Score(cv_model,y_test) |
Output:
{1: '100.0%', 2: '100.0%', 3: '100.0%', 4: '100.0%', 5: '100.0%'}
Further Improvements
To further enhance your model, consider:
- Feature engineering to create informative features.
- Exploring different evaluation metrics for classification tasks, especially if dealing with imbalanced data.
- Regularization techniques to prevent overfitting.
- Visualizations to gain insights into your data and model’s predictions.
CatBoost Parameters and Hyperparameters
For gradient boosting on decision trees, CatBoost is a well-liked open-source toolkit. It was created by Yandex and may be applied to a range of machine-learning issues, including classification, regression, ranking, and more. Compared to other boosting libraries, CatBoost has a number of benefits, including:
- It can handle categorical features automatically, without the need for encoding or preprocessing.
- It can reduce overfitting by using a novel gradient-boosting scheme and regularization techniques.
- It can achieve high performance and scalability by using efficient implementations for CPU and GPU.
In this post, we will concentrate on the CatBoost parameters and hyperparameters, which are the variables that regulate the algorithm’s operation and performance. We will describe them, how they impact the model, and how to fine-tune them for the best outcomes.