Testing the Machine Learning Model
To ensure that the model can break the CAPTCHA system, it is very important to test its performance. But we can’t use the images we used to train it. So, we will use the images we didn’t use in the previous step. Various metrics are used to determine how good our model is. These are F1 Score, accuracy, recall, precision and etc.
- F1 Score: F1 Score is one metric to understand how good a model is. It is a function of accuracy and recall and ranges from 0-1.
- Accuracy: Accuracy is the ratio between correct prediction and total predictions.
- Recall: Recall states how accurately it can identify all the data points of a given class.
- Precision: Precision is the ratio between no of true predictions and no of positive predictions made by the model.
Despite our best efforts, an AI model would never be fully accurate. So, we can’t rely on the model to always be effective. This is partly because Machine Learning is being used on the servers of CAPTCHA services as well to identify and block the attempts to break the CAPTCHA system. Not only that, but CAPTCHAs are also becoming harder to understand for Machine Learning models as they are introducing new types of CAPTCHAs. Now it’s time to know how well the model works.
Python3
# Get the Model prediction_model = keras.models.Model( model.get_layer(name = "image" ). input , model.get_layer(name = "dense2" ).output ) prediction_model.summary() def decode_batch_predictions(pred): input_len = np.ones(pred.shape[ 0 ]) * pred.shape[ 1 ] results = keras.backend.ctc_decode(pred, input_length = input_len, greedy = True )[ 0 ][ 0 ][ :, :max_length ] output_text = [] for res in results: res = tf.strings.reduce_join(num_to_char(res))\ .numpy().decode( "utf-8" ) output_text.append(res) return output_text |
Again we use the trained model to predict the text that is present in the captcha codes.
Python3
# Check the validation on a few samples for batch in val_data.take( 1 ): batch_images = batch[ "image" ] batch_labels = batch[ "label" ] preds = prediction_model.predict(batch_images) pred_texts = decode_batch_predictions(preds) orig_texts = [] for label in batch_labels: label = tf.strings.reduce_join(num_to_char(label))\ .numpy().decode( "utf-8" ) orig_texts.append(label) _, ax = plt.subplots( 4 , 4 , figsize = ( 15 , 5 )) for i in range ( len (pred_texts)): img = (batch_images[i, :, :, 0 ] * 255 ).\ numpy().astype(np.uint8) img = img.T title = f "Prediction: {pred_texts[i]}" ax[i / / 4 , i % 4 ].imshow(img, cmap = "gray" ) ax[i / / 4 , i % 4 ].set_title(title) ax[i / / 4 , i % 4 ].axis( "off" ) plt.show() |
Output:
How to Break a CAPTCHA System with Machine Learning?
CAPTCHA, short for Completely Automated Public Turing Test to Tell Computers and Humans Apart, is a revolutionary technology that helps identify humans from bots and saves your site from malicious intentions. But this technology has begun to show its age. Captcha was supposed to be a robust system, but artificial intelligence is driving it almost useless. To break a Captcha, we require a machine-learning model which we need to train. After its training, all that is required is to feed the model any CAPTCHA you want, which it will solve for you.
Through this article, we will explore how one can break a CAPTCHA system with the help of machine learning. We will discuss in detail the complete process. Besides, we will also share the limitations of this approach and the ethical and moral issues that need to be considered while attempting this. This should be remembered that our intention behind breaking CAPTCHA should be to educate ourselves and highlight the incapability of the system to filter out non-humans. But CAPTCHAs are the things saving sites from malicious attacks, and they are effectively safeguarding the internet. So, using bots to break CAPTCHAs on websites without permission is unethical at best and also illegal, depending on your location.