Kai Wu, Dingjiang Yan, Hongcheng Liao, Xiang Zhang, Q. Huang, Qian Zhang, Min Fu
{"title":"Application of Data Augmentation of Rare Words Based on “Font-CycleGan” in Character Recognition","authors":"Kai Wu, Dingjiang Yan, Hongcheng Liao, Xiang Zhang, Q. Huang, Qian Zhang, Min Fu","doi":"10.1109/ICESIT53460.2021.9696657","DOIUrl":null,"url":null,"abstract":"Given the low efficiency of massive image data processing in actual business departments, image character recognition OCR (optical character recognition) based on deep learning neural networks is not high. Based on the above background, a text image data enhancement method based on Font-CycleGan is proposed to improve text recognition accuracy. Our goal is to generate the data of specific fonts for the original hidden word text samples through the font library and then enhance the data of the hidden word text through CycleGan to enrich the database of the hidden word text to improve the performance of the classifier and the accuracy of text recognition. We evaluate and compare the impact of traditional character recognition and font-cyclgan based data to enhance character recognition accuracy. The results show that the improvement effect of this method is more significant.","PeriodicalId":164745,"journal":{"name":"2021 IEEE International Conference on Emergency Science and Information Technology (ICESIT)","volume":"291 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Conference on Emergency Science and Information Technology (ICESIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICESIT53460.2021.9696657","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Given the low efficiency of massive image data processing in actual business departments, image character recognition OCR (optical character recognition) based on deep learning neural networks is not high. Based on the above background, a text image data enhancement method based on Font-CycleGan is proposed to improve text recognition accuracy. Our goal is to generate the data of specific fonts for the original hidden word text samples through the font library and then enhance the data of the hidden word text through CycleGan to enrich the database of the hidden word text to improve the performance of the classifier and the accuracy of text recognition. We evaluate and compare the impact of traditional character recognition and font-cyclgan based data to enhance character recognition accuracy. The results show that the improvement effect of this method is more significant.