{"title":"利用三重深度关注对文本验证码进行基于变换器的端到端攻击","authors":"","doi":"10.1016/j.cose.2024.104058","DOIUrl":null,"url":null,"abstract":"<div><p>Websites frequently use text-based captcha images to distinguish whether the user is a person or not. Previous research mainly focuses on different training strategies and neglects the characteristics of the text-based captcha images themselves, resulting in low accuracy. For text-based captcha images characterized by rotation, distortion, and non-character elements, we propose an end-to-end attack using a Transformer-based method with triplet deep attention. Firstly, the features of text-based captchas are extracted using ResNet45 with triplet deep attention module and Transformer encoder. The TDA module is capable of learning rotational and distortion features of characters. Subsequently, based on self-attention mechanism, design query, key, and value, and adopt the query enhancement module to enhance the query. The query enhancement module can strengthen character localization and reduce attention drift towards non-character elements. Finally, the feature maps are transformed into probabilities of character for the final text recognition. Experiments are conducted on captcha datasets based on Roman characters from 9 popular websites, achieving average word accuracy of 91.14%. To evaluate the performance of our method on data with small samples, experiments are conducted different scales of training data. Additionally, we use the method on Chinese text-based captcha tasks and achieve average word accuracy of 99.60%. The effectiveness of the method is also explored under conditions of lack of illumination and scene text recognition, where background interference is present.</p></div>","PeriodicalId":51004,"journal":{"name":"Computers & Security","volume":null,"pages":null},"PeriodicalIF":4.8000,"publicationDate":"2024-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Transformer-based end-to-end attack on text CAPTCHAs with triplet deep attention\",\"authors\":\"\",\"doi\":\"10.1016/j.cose.2024.104058\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Websites frequently use text-based captcha images to distinguish whether the user is a person or not. Previous research mainly focuses on different training strategies and neglects the characteristics of the text-based captcha images themselves, resulting in low accuracy. For text-based captcha images characterized by rotation, distortion, and non-character elements, we propose an end-to-end attack using a Transformer-based method with triplet deep attention. Firstly, the features of text-based captchas are extracted using ResNet45 with triplet deep attention module and Transformer encoder. The TDA module is capable of learning rotational and distortion features of characters. Subsequently, based on self-attention mechanism, design query, key, and value, and adopt the query enhancement module to enhance the query. The query enhancement module can strengthen character localization and reduce attention drift towards non-character elements. Finally, the feature maps are transformed into probabilities of character for the final text recognition. Experiments are conducted on captcha datasets based on Roman characters from 9 popular websites, achieving average word accuracy of 91.14%. To evaluate the performance of our method on data with small samples, experiments are conducted different scales of training data. Additionally, we use the method on Chinese text-based captcha tasks and achieve average word accuracy of 99.60%. The effectiveness of the method is also explored under conditions of lack of illumination and scene text recognition, where background interference is present.</p></div>\",\"PeriodicalId\":51004,\"journal\":{\"name\":\"Computers & Security\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":4.8000,\"publicationDate\":\"2024-08-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computers & Security\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0167404824003638\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Security","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167404824003638","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
Transformer-based end-to-end attack on text CAPTCHAs with triplet deep attention
Websites frequently use text-based captcha images to distinguish whether the user is a person or not. Previous research mainly focuses on different training strategies and neglects the characteristics of the text-based captcha images themselves, resulting in low accuracy. For text-based captcha images characterized by rotation, distortion, and non-character elements, we propose an end-to-end attack using a Transformer-based method with triplet deep attention. Firstly, the features of text-based captchas are extracted using ResNet45 with triplet deep attention module and Transformer encoder. The TDA module is capable of learning rotational and distortion features of characters. Subsequently, based on self-attention mechanism, design query, key, and value, and adopt the query enhancement module to enhance the query. The query enhancement module can strengthen character localization and reduce attention drift towards non-character elements. Finally, the feature maps are transformed into probabilities of character for the final text recognition. Experiments are conducted on captcha datasets based on Roman characters from 9 popular websites, achieving average word accuracy of 91.14%. To evaluate the performance of our method on data with small samples, experiments are conducted different scales of training data. Additionally, we use the method on Chinese text-based captcha tasks and achieve average word accuracy of 99.60%. The effectiveness of the method is also explored under conditions of lack of illumination and scene text recognition, where background interference is present.
期刊介绍:
Computers & Security is the most respected technical journal in the IT security field. With its high-profile editorial board and informative regular features and columns, the journal is essential reading for IT security professionals around the world.
Computers & Security provides you with a unique blend of leading edge research and sound practical management advice. It is aimed at the professional involved with computer security, audit, control and data integrity in all sectors - industry, commerce and academia. Recognized worldwide as THE primary source of reference for applied research and technical expertise it is your first step to fully secure systems.