Bao Hieu Tran, Duc Viet Hoang, Nguyen Manh Hiep, Pham Ngoc Bao Anh, Hoang Gia Bao, Nguyen Duc Anh, Bui Hai Phong, T. Nguyen, Phi-Le Nguyen, Thi-Lan Le
{"title":"MC-OCR Challenge 2021: A Multi-modal Approach for Mobile-Captured Vietnamese Receipts Recognition","authors":"Bao Hieu Tran, Duc Viet Hoang, Nguyen Manh Hiep, Pham Ngoc Bao Anh, Hoang Gia Bao, Nguyen Duc Anh, Bui Hai Phong, T. Nguyen, Phi-Le Nguyen, Thi-Lan Le","doi":"10.1109/RIVF51545.2021.9642088","DOIUrl":null,"url":null,"abstract":"Mobile captured receipts OCR (MC-OCR) recognizes text from structured and semi-structured receipts and invoices captured by mobile devices. This process plays a critical role in streamlining document-intensive processes and office automation in many financial, accounting, and taxation areas. Although many efforts have been devoted, MC-OCR still faces significant challenges due to mobile captured images’ complexity. First, receipts might be crumpled, or the content might be blurred. Second, different from scanned images, the quality of photos taken by mobile devices shows high diversity due to the light condition and the dynamic environment (e.g., indoor, out-door, complex background, etc.) where the receipts were captured. These difficulties lead to a low accuracy of the recognition results. In this challenge, we target two tasks to address these issues, including (1) evaluating the quality of the captured receipts, and (2) recognizing required fields of the receipts. Our idea is to leverage a multi-modal approach which can take advantage of both areas: computer vision and natural language processing, two of the main interests of the RIVF community. The paper presents the BK-OCR team’s methodology and results in the Mobile-Captured Image Document Recognition for Vietnamese Receipts 2021.","PeriodicalId":6860,"journal":{"name":"2021 RIVF International Conference on Computing and Communication Technologies (RIVF)","volume":"6 1","pages":"1-6"},"PeriodicalIF":0.0000,"publicationDate":"2021-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 RIVF International Conference on Computing and Communication Technologies (RIVF)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/RIVF51545.2021.9642088","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Mobile captured receipts OCR (MC-OCR) recognizes text from structured and semi-structured receipts and invoices captured by mobile devices. This process plays a critical role in streamlining document-intensive processes and office automation in many financial, accounting, and taxation areas. Although many efforts have been devoted, MC-OCR still faces significant challenges due to mobile captured images’ complexity. First, receipts might be crumpled, or the content might be blurred. Second, different from scanned images, the quality of photos taken by mobile devices shows high diversity due to the light condition and the dynamic environment (e.g., indoor, out-door, complex background, etc.) where the receipts were captured. These difficulties lead to a low accuracy of the recognition results. In this challenge, we target two tasks to address these issues, including (1) evaluating the quality of the captured receipts, and (2) recognizing required fields of the receipts. Our idea is to leverage a multi-modal approach which can take advantage of both areas: computer vision and natural language processing, two of the main interests of the RIVF community. The paper presents the BK-OCR team’s methodology and results in the Mobile-Captured Image Document Recognition for Vietnamese Receipts 2021.