{"title":"位置编码卷积网络求解连通文本字幕","authors":"Ke Qing, Rongsheng Zhang","doi":"10.2478/jaiscr-2022-0008","DOIUrl":null,"url":null,"abstract":"Abstract Text-based CAPTCHA is a convenient and effective safety mechanism that has been widely deployed across websites. The efficient end-to-end models of scene text recognition consisting of CNN and attention-based RNN show limited performance in solving text-based CAPTCHAs. In contrast with the street view image and document, the character sequence in CAPTCHA is non-semantic. The RNN loses its ability to learn the semantic context and only implicitly encodes the relative position of extracted features. Meanwhile, the security features, which prevent characters from segmentation and recognition, extensively increase the complexity of CAPTCHAs. The performance of this model is sensitive to different CAPTCHA schemes. In this paper, we analyze the properties of the text-based CAPTCHA and accordingly consider solving it as a highly position-relative character sequence recognition task. We propose a network named PosConv to leverage the position information in the character sequence without RNN. PosConv uses a novel padding strategy and modified convolution, explicitly encoding the relative position into the local features of characters. This mechanism of PosConv makes the extracted features from CAPTCHAs more informative and robust. We validate PosConv on six text-based CAPTCHA schemes, and it achieves state-of-the-art or competitive recognition accuracy with significantly fewer parameters and faster convergence speed.","PeriodicalId":48494,"journal":{"name":"Journal of Artificial Intelligence and Soft Computing Research","volume":"12 1","pages":"121 - 133"},"PeriodicalIF":3.3000,"publicationDate":"2021-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Position-Encoding Convolutional Network to Solving Connected Text Captcha\",\"authors\":\"Ke Qing, Rongsheng Zhang\",\"doi\":\"10.2478/jaiscr-2022-0008\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstract Text-based CAPTCHA is a convenient and effective safety mechanism that has been widely deployed across websites. The efficient end-to-end models of scene text recognition consisting of CNN and attention-based RNN show limited performance in solving text-based CAPTCHAs. In contrast with the street view image and document, the character sequence in CAPTCHA is non-semantic. The RNN loses its ability to learn the semantic context and only implicitly encodes the relative position of extracted features. Meanwhile, the security features, which prevent characters from segmentation and recognition, extensively increase the complexity of CAPTCHAs. The performance of this model is sensitive to different CAPTCHA schemes. In this paper, we analyze the properties of the text-based CAPTCHA and accordingly consider solving it as a highly position-relative character sequence recognition task. We propose a network named PosConv to leverage the position information in the character sequence without RNN. PosConv uses a novel padding strategy and modified convolution, explicitly encoding the relative position into the local features of characters. This mechanism of PosConv makes the extracted features from CAPTCHAs more informative and robust. We validate PosConv on six text-based CAPTCHA schemes, and it achieves state-of-the-art or competitive recognition accuracy with significantly fewer parameters and faster convergence speed.\",\"PeriodicalId\":48494,\"journal\":{\"name\":\"Journal of Artificial Intelligence and Soft Computing Research\",\"volume\":\"12 1\",\"pages\":\"121 - 133\"},\"PeriodicalIF\":3.3000,\"publicationDate\":\"2021-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Artificial Intelligence and Soft Computing Research\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.2478/jaiscr-2022-0008\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Artificial Intelligence and Soft Computing Research","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.2478/jaiscr-2022-0008","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Position-Encoding Convolutional Network to Solving Connected Text Captcha
Abstract Text-based CAPTCHA is a convenient and effective safety mechanism that has been widely deployed across websites. The efficient end-to-end models of scene text recognition consisting of CNN and attention-based RNN show limited performance in solving text-based CAPTCHAs. In contrast with the street view image and document, the character sequence in CAPTCHA is non-semantic. The RNN loses its ability to learn the semantic context and only implicitly encodes the relative position of extracted features. Meanwhile, the security features, which prevent characters from segmentation and recognition, extensively increase the complexity of CAPTCHAs. The performance of this model is sensitive to different CAPTCHA schemes. In this paper, we analyze the properties of the text-based CAPTCHA and accordingly consider solving it as a highly position-relative character sequence recognition task. We propose a network named PosConv to leverage the position information in the character sequence without RNN. PosConv uses a novel padding strategy and modified convolution, explicitly encoding the relative position into the local features of characters. This mechanism of PosConv makes the extracted features from CAPTCHAs more informative and robust. We validate PosConv on six text-based CAPTCHA schemes, and it achieves state-of-the-art or competitive recognition accuracy with significantly fewer parameters and faster convergence speed.
期刊介绍:
Journal of Artificial Intelligence and Soft Computing Research (available also at Sciendo (De Gruyter)) is a dynamically developing international journal focused on the latest scientific results and methods constituting traditional artificial intelligence methods and soft computing techniques. Our goal is to bring together scientists representing both approaches and various research communities.