Key Information Extraction from Mobile-Captured Vietnamese Receipt Images using Graph Neural Networks Approach

2022 6th International Conference on Green Technology and Sustainable Development (GTSD) Pub Date : 2022-07-29 DOI:10.1109/GTSD54989.2022.9989111

Van Dung Pham, L. Nguyen, Nhat Truong Pham, Bao Hung Nguyen, Due Ngoe Minh Dang, Sy Dzung Nguyen

{"title":"Key Information Extraction from Mobile-Captured Vietnamese Receipt Images using Graph Neural Networks Approach","authors":"Van Dung Pham, L. Nguyen, Nhat Truong Pham, Bao Hung Nguyen, Due Ngoe Minh Dang, Sy Dzung Nguyen","doi":"10.1109/GTSD54989.2022.9989111","DOIUrl":null,"url":null,"abstract":"Information extraction and retrieval are growing fields that have a significant role in document parser and analysis systems. Researches and applications developed in recent years show the numerous difficulties and obstacles in extracting key information from documents. Thanks to the raising of graph theory and deep learning, graph representation and graph learning have been widely applied in information extraction to obtain more exact results. In this paper, we propose a solution upon graph neural networks (GNN) for key information extraction (KIE) that aims to extract the key information from mobile-captured Vietnamese receipt images. Firstly, the images are pre-processed using U2-Net, and then a CRAFT model is used to detect texts from the pre-processed images. Next, the implemented TransformerOCR model is employed for text recognition. Finally, a GNN-based model is designed to extract the key information based on the recognized texts. For validating the effectiveness of the proposed solution, the publicly available dataset released from the Mobile-Captured Receipt Recognition (MC-OCR) Challenge 2021 is used to train and evaluate. The experimental results indicate that our proposed solution achieves a character error rate (CER) score of 0.25 on the private test set, which is more comparable with all reported solutions in the MC-OCR Challenge 2021 as mentioned in the literature. For reproducing and knowledge-sharing purposes, our implementation of the proposed solution is publicly available at https://github.com/ThorPhamlKey_infomation_extraction.","PeriodicalId":125445,"journal":{"name":"2022 6th International Conference on Green Technology and Sustainable Development (GTSD)","volume":"50 12","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 6th International Conference on Green Technology and Sustainable Development (GTSD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/GTSD54989.2022.9989111","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Information extraction and retrieval are growing fields that have a significant role in document parser and analysis systems. Researches and applications developed in recent years show the numerous difficulties and obstacles in extracting key information from documents. Thanks to the raising of graph theory and deep learning, graph representation and graph learning have been widely applied in information extraction to obtain more exact results. In this paper, we propose a solution upon graph neural networks (GNN) for key information extraction (KIE) that aims to extract the key information from mobile-captured Vietnamese receipt images. Firstly, the images are pre-processed using U2-Net, and then a CRAFT model is used to detect texts from the pre-processed images. Next, the implemented TransformerOCR model is employed for text recognition. Finally, a GNN-based model is designed to extract the key information based on the recognized texts. For validating the effectiveness of the proposed solution, the publicly available dataset released from the Mobile-Captured Receipt Recognition (MC-OCR) Challenge 2021 is used to train and evaluate. The experimental results indicate that our proposed solution achieves a character error rate (CER) score of 0.25 on the private test set, which is more comparable with all reported solutions in the MC-OCR Challenge 2021 as mentioned in the literature. For reproducing and knowledge-sharing purposes, our implementation of the proposed solution is publicly available at https://github.com/ThorPhamlKey_infomation_extraction.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于图神经网络的越南移动接收图像关键信息提取

信息提取和检索在文档解析和分析系统中扮演着重要的角色。近年来的研究和应用表明，从文档中提取关键信息存在诸多困难和障碍。由于图论和深度学习的提出，图表示和图学习在信息提取中得到了广泛的应用，以获得更精确的结果。在本文中，我们提出了一种基于图神经网络(GNN)的关键信息提取(KIE)解决方案，旨在从移动捕获的越南收据图像中提取关键信息。首先利用u2net对图像进行预处理，然后利用CRAFT模型对预处理后的图像进行文本检测。然后，将实现的TransformerOCR模型用于文本识别。最后，设计了基于gnn的模型，根据识别文本提取关键信息。为了验证所提出解决方案的有效性，使用了2021年移动捕获收据识别(MC-OCR)挑战发布的公开可用数据集进行训练和评估。实验结果表明，我们提出的解决方案在私有测试集上的字符错误率(CER)得分为0.25，与文献中提到的MC-OCR Challenge 2021中所有报道的解决方案更具可比性。出于复制和知识共享的目的，我们提出的解决方案的实现可在https://github.com/ThorPhamlKey_infomation_extraction上公开获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2022 6th International Conference on Green Technology and Sustainable Development (GTSD)

自引率

0.00%

发文量