布局感知单图像文档扁平化

IF 7.8 1区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING ACM Transactions on Graphics Pub Date : 2023-11-02 DOI:10.1145/3627818

Pu Li, Weize Quan, Jianwei Guo, Dong-Ming Yan

{"title":"布局感知单图像文档扁平化","authors":"Pu Li, Weize Quan, Jianwei Guo, Dong-Ming Yan","doi":"10.1145/3627818","DOIUrl":null,"url":null,"abstract":"Single image rectification of document deformation is a challenging task. Although some recent deep learning-based methods have attempted to solve this problem, they cannot achieve satisfactory results when dealing with document images with complex deformations. In this article, we propose a new efficient framework for document flattening. Our main insight is that most layout primitives in a document have rectangular outline shapes, making unwarping local layout primitives essentially homogeneous with unwarping the entire document. The former task is clearly more straightforward to solve than the latter due to the more consistent texture and relatively smooth deformation. On this basis, we propose a layout-aware deep model working in a divide-and-conquer manner. First, we employ a transformer-based segmentation module to obtain the layout information of the input document. Then a new regression module is applied to predict the global and local UV maps. Finally, we design an effective merging algorithm to correct the global prediction with local details. Both quantitative and qualitative experimental results demonstrate that our framework achieves favorable performance against state-of-the-art methods. In addition, the current publicly available document flattening datasets have limited 3D paper shapes without layout annotation and also lack a general geometric correction metric. Therefore, we build a new large-scale synthetic dataset by utilizing a fully automatic rendering method to generate deformed documents with diverse shapes and exact layout segmentation labels. We also propose a new geometric correction metric based on our paired document UV maps. Code and dataset will be released at https://github.com/BunnySoCrazy/LA-DocFlatten .","PeriodicalId":50913,"journal":{"name":"ACM Transactions on Graphics","volume":"58 4","pages":"0"},"PeriodicalIF":7.8000,"publicationDate":"2023-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Layout-Aware Single-Image Document Flattening\",\"authors\":\"Pu Li, Weize Quan, Jianwei Guo, Dong-Ming Yan\",\"doi\":\"10.1145/3627818\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Single image rectification of document deformation is a challenging task. Although some recent deep learning-based methods have attempted to solve this problem, they cannot achieve satisfactory results when dealing with document images with complex deformations. In this article, we propose a new efficient framework for document flattening. Our main insight is that most layout primitives in a document have rectangular outline shapes, making unwarping local layout primitives essentially homogeneous with unwarping the entire document. The former task is clearly more straightforward to solve than the latter due to the more consistent texture and relatively smooth deformation. On this basis, we propose a layout-aware deep model working in a divide-and-conquer manner. First, we employ a transformer-based segmentation module to obtain the layout information of the input document. Then a new regression module is applied to predict the global and local UV maps. Finally, we design an effective merging algorithm to correct the global prediction with local details. Both quantitative and qualitative experimental results demonstrate that our framework achieves favorable performance against state-of-the-art methods. In addition, the current publicly available document flattening datasets have limited 3D paper shapes without layout annotation and also lack a general geometric correction metric. Therefore, we build a new large-scale synthetic dataset by utilizing a fully automatic rendering method to generate deformed documents with diverse shapes and exact layout segmentation labels. We also propose a new geometric correction metric based on our paired document UV maps. Code and dataset will be released at https://github.com/BunnySoCrazy/LA-DocFlatten .\",\"PeriodicalId\":50913,\"journal\":{\"name\":\"ACM Transactions on Graphics\",\"volume\":\"58 4\",\"pages\":\"0\"},\"PeriodicalIF\":7.8000,\"publicationDate\":\"2023-11-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Graphics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3627818\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, SOFTWARE ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Graphics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3627818","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

摘要

单幅图像的文档变形校正是一项具有挑战性的任务。尽管最近一些基于深度学习的方法试图解决这个问题，但在处理具有复杂变形的文档图像时，它们无法取得令人满意的结果。在本文中，我们提出了一个新的有效的文档扁平化框架。我们的主要见解是，文档中的大多数布局原语都具有矩形轮廓形状，这使得取消弯曲的局部布局原语基本上与取消弯曲的整个文档是一致的。前者的任务显然比后者更容易解决，因为纹理更一致，变形相对平滑。在此基础上，我们提出了一种以分而治之的方式工作的布局感知深度模型。首先，我们使用一个基于变压器的分割模块来获取输入文档的布局信息。然后应用一个新的回归模块对全局和局部UV贴图进行预测。最后，设计了一种有效的融合算法，利用局部细节对全局预测进行校正。定量和定性实验结果表明，我们的框架在最先进的方法下取得了良好的性能。此外，目前公开的文档平坦化数据集在没有布局注释的情况下具有有限的3D纸张形状，并且还缺乏通用的几何校正度量。因此，我们利用一种全自动渲染方法来生成具有不同形状和精确布局分割标签的变形文档，从而构建了一个新的大规模合成数据集。我们还提出了一种新的基于我们的配对文档UV地图的几何校正度量。代码和数据集将在https://github.com/BunnySoCrazy/LA-DocFlatten上发布。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Layout-Aware Single-Image Document Flattening

Single image rectification of document deformation is a challenging task. Although some recent deep learning-based methods have attempted to solve this problem, they cannot achieve satisfactory results when dealing with document images with complex deformations. In this article, we propose a new efficient framework for document flattening. Our main insight is that most layout primitives in a document have rectangular outline shapes, making unwarping local layout primitives essentially homogeneous with unwarping the entire document. The former task is clearly more straightforward to solve than the latter due to the more consistent texture and relatively smooth deformation. On this basis, we propose a layout-aware deep model working in a divide-and-conquer manner. First, we employ a transformer-based segmentation module to obtain the layout information of the input document. Then a new regression module is applied to predict the global and local UV maps. Finally, we design an effective merging algorithm to correct the global prediction with local details. Both quantitative and qualitative experimental results demonstrate that our framework achieves favorable performance against state-of-the-art methods. In addition, the current publicly available document flattening datasets have limited 3D paper shapes without layout annotation and also lack a general geometric correction metric. Therefore, we build a new large-scale synthetic dataset by utilizing a fully automatic rendering method to generate deformed documents with diverse shapes and exact layout segmentation labels. We also propose a new geometric correction metric based on our paired document UV maps. Code and dataset will be released at https://github.com/BunnySoCrazy/LA-DocFlatten .

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ACM Transactions on Graphics 工程技术-计算机：软件工程

CiteScore

14.30

自引率

25.80%

发文量

193

审稿时长

12 months

期刊介绍： ACM Transactions on Graphics (TOG) is a peer-reviewed scientific journal that aims to disseminate the latest findings of note in the field of computer graphics. It has been published since 1982 by the Association for Computing Machinery. Starting in 2003, all papers accepted for presentation at the annual SIGGRAPH conference are printed in a special summer issue of the journal.

期刊最新文献

Direct Manipulation of Procedural Implicit Surfaces 3DGSR: Implicit Surface Reconstruction with 3D Gaussian Splatting Quark: Real-time, High-resolution, and General Neural View Synthesis Differentiable Owen Scrambling ELMO: Enhanced Real-time LiDAR Motion Capture through Upsampling