Fabio Quattrini, Carmine Zaccagnino, Silvia Cascianelli, Laura Righi, Rita Cucchiara
{"title":"μgat: Improving Single-Page Document Parsing by Providing Multi-Page Context","authors":"Fabio Quattrini, Carmine Zaccagnino, Silvia Cascianelli, Laura Righi, Rita Cucchiara","doi":"arxiv-2408.15646","DOIUrl":null,"url":null,"abstract":"Regesta are catalogs of summaries of other documents and, in some cases, are\nthe only source of information about the content of such full-length documents.\nFor this reason, they are of great interest to scholars in many social and\nhumanities fields. In this work, we focus on Regesta Pontificum Romanum, a\nlarge collection of papal registers. Regesta are visually rich documents, where\nthe layout is as important as the text content to convey the contained\ninformation through the structure, and are inherently multi-page documents.\nAmong Digital Humanities techniques that can help scholars efficiently exploit\nregesta and other documental sources in the form of scanned documents, Document\nParsing has emerged as a task to process document images and convert them into\nmachine-readable structured representations, usually markup language. However,\ncurrent models focus on scientific and business documents, and most of them\nconsider only single-paged documents. To overcome this limitation, in this\nwork, we propose {\\mu}gat, an extension of the recently proposed Document\nparsing Nougat architecture, which can handle elements spanning over the single\npage limits. Specifically, we adapt Nougat to process a larger, multi-page\ncontext, consisting of the previous and the following page, while parsing the\ncurrent page. Experimental results, both qualitative and quantitative,\ndemonstrate the effectiveness of our proposed approach also in the case of the\nchallenging Regesta Pontificum Romanorum.","PeriodicalId":501285,"journal":{"name":"arXiv - CS - Digital Libraries","volume":"22 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Digital Libraries","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.15646","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Regesta are catalogs of summaries of other documents and, in some cases, are
the only source of information about the content of such full-length documents.
For this reason, they are of great interest to scholars in many social and
humanities fields. In this work, we focus on Regesta Pontificum Romanum, a
large collection of papal registers. Regesta are visually rich documents, where
the layout is as important as the text content to convey the contained
information through the structure, and are inherently multi-page documents.
Among Digital Humanities techniques that can help scholars efficiently exploit
regesta and other documental sources in the form of scanned documents, Document
Parsing has emerged as a task to process document images and convert them into
machine-readable structured representations, usually markup language. However,
current models focus on scientific and business documents, and most of them
consider only single-paged documents. To overcome this limitation, in this
work, we propose {\mu}gat, an extension of the recently proposed Document
parsing Nougat architecture, which can handle elements spanning over the single
page limits. Specifically, we adapt Nougat to process a larger, multi-page
context, consisting of the previous and the following page, while parsing the
current page. Experimental results, both qualitative and quantitative,
demonstrate the effectiveness of our proposed approach also in the case of the
challenging Regesta Pontificum Romanorum.