A Generic Solver Combining Unsupervised Learning and Representation Learning for Breaking Text-Based Captchas

Sheng Tian, T. Xiong
{"title":"A Generic Solver Combining Unsupervised Learning and Representation Learning for Breaking Text-Based Captchas","authors":"Sheng Tian, T. Xiong","doi":"10.1145/3366423.3380166","DOIUrl":null,"url":null,"abstract":"Although there are many alternative captcha schemes available, text-based captchas are still one of the most popular security mechanism to maintain Internet security and prevent malicious attacks, due to the user preferences and ease of design. Over the past decade, different methods of breaking captchas have been proposed, which helps captcha keep evolving and become more robust. However, these previous works generally require heavy expert involvement and gradually become ineffective with the introduction of new security features. This paper proposes a generic solver combining unsupervised learning and representation learning to automatically remove the noisy background of captchas and solve text-based captchas. We introduce a new training scheme for constructing mini-batches, which contain a large number of unlabeled hard examples, to improve the efficiency of representation learning. Unlike existing deep learning algorithms, our method requires significantly fewer labeled samples and surpasses the recognition performance of a fully-supervised model with the same network architecture. Moreover, extensive experiments show that the proposed method outperforms state-of-the-art by delivering a higher accuracy on various captcha schemes. We provide further discussions of potential applications of the proposed unified framework. We hope that our work can inspire the community to enhance the security of text-based captchas.","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"341 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2020-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of The Web Conference 2020","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3366423.3380166","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 12

Abstract

Although there are many alternative captcha schemes available, text-based captchas are still one of the most popular security mechanism to maintain Internet security and prevent malicious attacks, due to the user preferences and ease of design. Over the past decade, different methods of breaking captchas have been proposed, which helps captcha keep evolving and become more robust. However, these previous works generally require heavy expert involvement and gradually become ineffective with the introduction of new security features. This paper proposes a generic solver combining unsupervised learning and representation learning to automatically remove the noisy background of captchas and solve text-based captchas. We introduce a new training scheme for constructing mini-batches, which contain a large number of unlabeled hard examples, to improve the efficiency of representation learning. Unlike existing deep learning algorithms, our method requires significantly fewer labeled samples and surpasses the recognition performance of a fully-supervised model with the same network architecture. Moreover, extensive experiments show that the proposed method outperforms state-of-the-art by delivering a higher accuracy on various captcha schemes. We provide further discussions of potential applications of the proposed unified framework. We hope that our work can inspire the community to enhance the security of text-based captchas.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
结合无监督学习和表示学习的破解文本验证码的通用求解器
尽管有许多替代验证码方案可用,但基于文本的验证码仍然是维护互联网安全和防止恶意攻击的最流行的安全机制之一,因为用户偏好和易于设计。在过去的十年里,人们提出了不同的破解验证码的方法,这有助于验证码不断发展并变得更加强大。然而,这些以前的工作通常需要大量的专家参与,并且随着新的安全特性的引入逐渐变得无效。本文提出了一种结合无监督学习和表示学习的通用求解器,用于自动去除验证码的噪声背景,求解基于文本的验证码。为了提高表示学习的效率,我们引入了一种新的训练方案来构造包含大量未标记的难样例的小批量。与现有的深度学习算法不同,我们的方法需要更少的标记样本,并且超过了具有相同网络架构的全监督模型的识别性能。此外,大量的实验表明,所提出的方法通过在各种验证码方案中提供更高的准确性而优于最先进的技术。我们进一步讨论了所提议的统一框架的潜在应用。我们希望我们的工作能够激励社会提高基于文本的验证码的安全性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Gone, Gone, but Not Really, and Gone, But Not forgotten: A Typology of Website Recoverability Those who are left behind: A chronicle of internet access in Cuba Towards Automated Technologies in the Referencing Quality of Wikidata Companion of The Web Conference 2022, Virtual Event / Lyon, France, April 25 - 29, 2022 WWW '21: The Web Conference 2021, Virtual Event / Ljubljana, Slovenia, April 19-23, 2021
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1