T. Hong, Wei-Chou Chen, Chih-Hung Wu, Bo Xiao, Bing-Yang Chiang, Zhi-Xun Shen
{"title":"Information Extraction and Analysis on Certificates and Medical Receipts","authors":"T. Hong, Wei-Chou Chen, Chih-Hung Wu, Bo Xiao, Bing-Yang Chiang, Zhi-Xun Shen","doi":"10.1109/ICCE53296.2022.9730569","DOIUrl":null,"url":null,"abstract":"Document digitalization has become a trend in recent years. It provides fast analysis and search because the information in the documents can be easily managed. However, in real applications, while digitalization is in progress, lots of information has not yet been digitalized and only stored on papers. A common demand of analyzing a large amount of documents would be a time-consuming mission because they need massive human labor. Nowadays, some computer vision algorithms have emerged and they can be applied in such a scenario. In this paper, we propose an automatic information extraction and analysis system for mandarin documents. It consists of three main steps. Firstly, the text regions in documents under natural scenes are detected. Secondly, these text regions are recognized and converted into digital forms. Finally, heuristic rules are designed and integrated into the system to improve the recognition accuracy. The proposed system is expected to eliminate the time-consuming problem of document information extraction.","PeriodicalId":350644,"journal":{"name":"2022 IEEE International Conference on Consumer Electronics (ICCE)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Consumer Electronics (ICCE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCE53296.2022.9730569","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Document digitalization has become a trend in recent years. It provides fast analysis and search because the information in the documents can be easily managed. However, in real applications, while digitalization is in progress, lots of information has not yet been digitalized and only stored on papers. A common demand of analyzing a large amount of documents would be a time-consuming mission because they need massive human labor. Nowadays, some computer vision algorithms have emerged and they can be applied in such a scenario. In this paper, we propose an automatic information extraction and analysis system for mandarin documents. It consists of three main steps. Firstly, the text regions in documents under natural scenes are detected. Secondly, these text regions are recognized and converted into digital forms. Finally, heuristic rules are designed and integrated into the system to improve the recognition accuracy. The proposed system is expected to eliminate the time-consuming problem of document information extraction.