利用三重深度关注对文本验证码进行基于变换器的端到端攻击

IF 5.4 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Computers & Security Pub Date : 2024-11-01 Epub Date: 2024-08-17 DOI:10.1016/j.cose.2024.104058

Bo Zhang, Yu-Jie Xiong, Chunming Xia, Yongbin Gao

{"title":"利用三重深度关注对文本验证码进行基于变换器的端到端攻击","authors":"Bo Zhang, Yu-Jie Xiong, Chunming Xia, Yongbin Gao","doi":"10.1016/j.cose.2024.104058","DOIUrl":null,"url":null,"abstract":"<div><p>Websites frequently use text-based captcha images to distinguish whether the user is a person or not. Previous research mainly focuses on different training strategies and neglects the characteristics of the text-based captcha images themselves, resulting in low accuracy. For text-based captcha images characterized by rotation, distortion, and non-character elements, we propose an end-to-end attack using a Transformer-based method with triplet deep attention. Firstly, the features of text-based captchas are extracted using ResNet45 with triplet deep attention module and Transformer encoder. The TDA module is capable of learning rotational and distortion features of characters. Subsequently, based on self-attention mechanism, design query, key, and value, and adopt the query enhancement module to enhance the query. The query enhancement module can strengthen character localization and reduce attention drift towards non-character elements. Finally, the feature maps are transformed into probabilities of character for the final text recognition. Experiments are conducted on captcha datasets based on Roman characters from 9 popular websites, achieving average word accuracy of 91.14%. To evaluate the performance of our method on data with small samples, experiments are conducted different scales of training data. Additionally, we use the method on Chinese text-based captcha tasks and achieve average word accuracy of 99.60%. The effectiveness of the method is also explored under conditions of lack of illumination and scene text recognition, where background interference is present.</p></div>","PeriodicalId":51004,"journal":{"name":"Computers & Security","volume":"146 ","pages":"Article 104058"},"PeriodicalIF":5.4000,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Transformer-based end-to-end attack on text CAPTCHAs with triplet deep attention\",\"authors\":\"Bo Zhang, Yu-Jie Xiong, Chunming Xia, Yongbin Gao\",\"doi\":\"10.1016/j.cose.2024.104058\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Websites frequently use text-based captcha images to distinguish whether the user is a person or not. Previous research mainly focuses on different training strategies and neglects the characteristics of the text-based captcha images themselves, resulting in low accuracy. For text-based captcha images characterized by rotation, distortion, and non-character elements, we propose an end-to-end attack using a Transformer-based method with triplet deep attention. Firstly, the features of text-based captchas are extracted using ResNet45 with triplet deep attention module and Transformer encoder. The TDA module is capable of learning rotational and distortion features of characters. Subsequently, based on self-attention mechanism, design query, key, and value, and adopt the query enhancement module to enhance the query. The query enhancement module can strengthen character localization and reduce attention drift towards non-character elements. Finally, the feature maps are transformed into probabilities of character for the final text recognition. Experiments are conducted on captcha datasets based on Roman characters from 9 popular websites, achieving average word accuracy of 91.14%. To evaluate the performance of our method on data with small samples, experiments are conducted different scales of training data. Additionally, we use the method on Chinese text-based captcha tasks and achieve average word accuracy of 99.60%. The effectiveness of the method is also explored under conditions of lack of illumination and scene text recognition, where background interference is present.</p></div>\",\"PeriodicalId\":51004,\"journal\":{\"name\":\"Computers & Security\",\"volume\":\"146 \",\"pages\":\"Article 104058\"},\"PeriodicalIF\":5.4000,\"publicationDate\":\"2024-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computers & Security\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0167404824003638\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/8/17 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Security","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167404824003638","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/8/17 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

网站经常使用基于文本的验证码图像来区分用户是否是人。以往的研究主要集中在不同的训练策略上，忽略了基于文本的验证码图像本身的特点，导致准确率较低。针对基于文本的验证码图像存在旋转、失真和非字符元素等特点，我们提出了一种基于变换器的端到端攻击方法，并结合三重深度关注。首先，使用带有三重深度注意模块和变换器编码器的 ResNet45 提取基于文本的验证码特征。TDA 模块能够学习字符的旋转和变形特征。然后，基于自注意机制，设计查询、键和值，并采用查询增强模块来增强查询。查询增强模块可以加强字符定位，减少对非字符元素的注意力偏移。最后，将特征图转化为字符概率，进行最终的文本识别。我们在来自 9 个流行网站的基于罗马字符的验证码数据集上进行了实验，平均单词准确率达到 91.14%。为了评估我们的方法在小样本数据上的性能，我们进行了不同规模的训练数据实验。此外，我们还在基于中文文本的验证码任务中使用了该方法，并取得了 99.60% 的平均单词准确率。我们还探讨了该方法在光照不足和存在背景干扰的场景文本识别条件下的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Transformer-based end-to-end attack on text CAPTCHAs with triplet deep attention

Websites frequently use text-based captcha images to distinguish whether the user is a person or not. Previous research mainly focuses on different training strategies and neglects the characteristics of the text-based captcha images themselves, resulting in low accuracy. For text-based captcha images characterized by rotation, distortion, and non-character elements, we propose an end-to-end attack using a Transformer-based method with triplet deep attention. Firstly, the features of text-based captchas are extracted using ResNet45 with triplet deep attention module and Transformer encoder. The TDA module is capable of learning rotational and distortion features of characters. Subsequently, based on self-attention mechanism, design query, key, and value, and adopt the query enhancement module to enhance the query. The query enhancement module can strengthen character localization and reduce attention drift towards non-character elements. Finally, the feature maps are transformed into probabilities of character for the final text recognition. Experiments are conducted on captcha datasets based on Roman characters from 9 popular websites, achieving average word accuracy of 91.14%. To evaluate the performance of our method on data with small samples, experiments are conducted different scales of training data. Additionally, we use the method on Chinese text-based captcha tasks and achieve average word accuracy of 99.60%. The effectiveness of the method is also explored under conditions of lack of illumination and scene text recognition, where background interference is present.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Computers & Security 工程技术-计算机：信息系统

CiteScore

12.40

自引率

7.10%

发文量

365

审稿时长

10.7 months

期刊介绍： Computers & Security is the most respected technical journal in the IT security field. With its high-profile editorial board and informative regular features and columns, the journal is essential reading for IT security professionals around the world. Computers & Security provides you with a unique blend of leading edge research and sound practical management advice. It is aimed at the professional involved with computer security, audit, control and data integrity in all sectors - industry, commerce and academia. Recognized worldwide as THE primary source of reference for applied research and technical expertise it is your first step to fully secure systems.