Query-driven Generative Network for Document Information Extraction in the Wild

Proceedings of the 30th ACM International Conference on Multimedia Pub Date : 2022-10-10 DOI:10.1145/3503161.3547877

H. Cao, Xin Li, Jiefeng Ma, Deqiang Jiang, Antai Guo, Yiqing Hu, Hao Liu, Yinsong Liu, Bo Ren

{"title":"Query-driven Generative Network for Document Information Extraction in the Wild","authors":"H. Cao, Xin Li, Jiefeng Ma, Deqiang Jiang, Antai Guo, Yiqing Hu, Hao Liu, Yinsong Liu, Bo Ren","doi":"10.1145/3503161.3547877","DOIUrl":null,"url":null,"abstract":"This paper focuses on solving Document Information Extraction (DIE) in the wild problem, which is rarely explored before. In contrast to existing studies mainly tailored for document cases in known templates with predefined layouts and keys under the ideal input without OCR errors involved, we aim to build up a more practical DIE paradigm for real-world scenarios where input document images may contain unknown layouts and keys in the scenes of the problematic OCR results. To achieve this goal, we propose a novel architecture, termed Query-driven Generative Network (QGN), which is equipped with two consecutive modules, i.e., Layout Context-aware Module (LCM) and Structured Generation Module (SGM). Given a document image with unseen layouts and fields, the former LCM yields the value prefix candidates serving as the query prompts for the SGM to generate the final key-value pairs even with OCR noise. To further investigate the potential of our method, we create a new large-scale dataset, named LArge-scale STructured Documents (LastDoc4000), containing 4,000 documents with 1,511 layouts and 3,500 different keys. In experiments, we demonstrate that our QGN consistently achieves the best F1-score on the new LastDoc4000 dataset by at most 30.32% absolute improvement. A more comprehensive experimental analysis and experiments on other public benchmarks also verify the effectiveness and robustness of our proposed method for the wild DIE task.","PeriodicalId":412792,"journal":{"name":"Proceedings of the 30th ACM International Conference on Multimedia","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 30th ACM International Conference on Multimedia","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3503161.3547877","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

Abstract

This paper focuses on solving Document Information Extraction (DIE) in the wild problem, which is rarely explored before. In contrast to existing studies mainly tailored for document cases in known templates with predefined layouts and keys under the ideal input without OCR errors involved, we aim to build up a more practical DIE paradigm for real-world scenarios where input document images may contain unknown layouts and keys in the scenes of the problematic OCR results. To achieve this goal, we propose a novel architecture, termed Query-driven Generative Network (QGN), which is equipped with two consecutive modules, i.e., Layout Context-aware Module (LCM) and Structured Generation Module (SGM). Given a document image with unseen layouts and fields, the former LCM yields the value prefix candidates serving as the query prompts for the SGM to generate the final key-value pairs even with OCR noise. To further investigate the potential of our method, we create a new large-scale dataset, named LArge-scale STructured Documents (LastDoc4000), containing 4,000 documents with 1,511 layouts and 3,500 different keys. In experiments, we demonstrate that our QGN consistently achieves the best F1-score on the new LastDoc4000 dataset by at most 30.32% absolute improvement. A more comprehensive experimental analysis and experiments on other public benchmarks also verify the effectiveness and robustness of our proposed method for the wild DIE task.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于查询驱动的生成网络的野外文档信息提取

本文重点解决了文献信息抽取(DIE)的野外问题，这是以往很少有人探讨的问题。与现有的研究主要针对已知模板下的文档案例进行定制，在理想输入下具有预定义的布局和键，没有涉及OCR错误，我们的目标是为实际场景中输入文档图像可能包含未知布局和键的场景建立一个更实用的DIE范例。为了实现这一目标，我们提出了一种新的架构，称为查询驱动生成网络(QGN)，它配备了两个连续的模块，即布局上下文感知模块(LCM)和结构化生成模块(SGM)。给定具有未见布局和字段的文档图像，前一种LCM生成值前缀候选者，作为SGM生成最终键值对的查询提示，即使有OCR噪声。为了进一步研究我们的方法的潜力，我们创建了一个新的大规模数据集，名为大规模结构化文档(LastDoc4000)，其中包含有1,511种布局和3,500个不同键的4,000个文档。在实验中，我们证明了我们的QGN在新的LastDoc4000数据集上始终达到最佳f1分数，绝对提高最多30.32%。更全面的实验分析和在其他公共基准上的实验也验证了我们提出的方法对野生DIE任务的有效性和鲁棒性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the 30th ACM International Conference on Multimedia

自引率

0.00%

发文量