UniHDSA: A unified relation prediction approach for hierarchical document structure analysis

IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pattern Recognition Pub Date : 2025-09-01 Epub Date: 2025-03-27 DOI:10.1016/j.patcog.2025.111617
Jiawei Wang , Kai Hu , Qiang Huo
{"title":"UniHDSA: A unified relation prediction approach for hierarchical document structure analysis","authors":"Jiawei Wang ,&nbsp;Kai Hu ,&nbsp;Qiang Huo","doi":"10.1016/j.patcog.2025.111617","DOIUrl":null,"url":null,"abstract":"<div><div>Document structure analysis, aka document layout analysis, is crucial for understanding both the physical layout and logical structure of documents, serving information retrieval, document summarization, knowledge extraction, etc. Hierarchical Document Structure Analysis (HDSA) specifically aims to restore the hierarchical structure of documents created using authoring software with hierarchical schemas. Previous research has primarily followed two approaches: one focuses on tackling specific subtasks of HDSA in isolation, such as table detection or reading order prediction, while the other adopts a unified framework that uses multiple branches or modules, each designed to address a distinct task. In this work, we propose a unified relation prediction approach for HDSA, called UniHDSA, which treats various HDSA sub-tasks as relation prediction problems and consolidates relation prediction labels into a unified label space. This allows a single relation prediction module to handle multiple tasks simultaneously, whether at a page-level or document-level structure analysis. By doing so, our approach significantly reduces the risk of cascading errors and enhances system’s efficiency, scalability, and adaptability. To validate the effectiveness of UniHDSA, we develop a multimodal end-to-end system based on Transformer architectures. Extensive experimental results demonstrate that our approach achieves state-of-the-art performance on a hierarchical document structure analysis benchmark, Comp-HRDoc, and competitive results on a large-scale document layout analysis dataset, DocLayNet, effectively illustrating the superiority of our method across all sub-tasks.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"165 ","pages":"Article 111617"},"PeriodicalIF":7.6000,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0031320325002778","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/3/27 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Document structure analysis, aka document layout analysis, is crucial for understanding both the physical layout and logical structure of documents, serving information retrieval, document summarization, knowledge extraction, etc. Hierarchical Document Structure Analysis (HDSA) specifically aims to restore the hierarchical structure of documents created using authoring software with hierarchical schemas. Previous research has primarily followed two approaches: one focuses on tackling specific subtasks of HDSA in isolation, such as table detection or reading order prediction, while the other adopts a unified framework that uses multiple branches or modules, each designed to address a distinct task. In this work, we propose a unified relation prediction approach for HDSA, called UniHDSA, which treats various HDSA sub-tasks as relation prediction problems and consolidates relation prediction labels into a unified label space. This allows a single relation prediction module to handle multiple tasks simultaneously, whether at a page-level or document-level structure analysis. By doing so, our approach significantly reduces the risk of cascading errors and enhances system’s efficiency, scalability, and adaptability. To validate the effectiveness of UniHDSA, we develop a multimodal end-to-end system based on Transformer architectures. Extensive experimental results demonstrate that our approach achieves state-of-the-art performance on a hierarchical document structure analysis benchmark, Comp-HRDoc, and competitive results on a large-scale document layout analysis dataset, DocLayNet, effectively illustrating the superiority of our method across all sub-tasks.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
unhdsa:用于分层文档结构分析的统一关系预测方法
文档结构分析,又称文档布局分析,对于理解文档的物理布局和逻辑结构,为信息检索、文档总结、知识提取等服务至关重要。层次文档结构分析(Hierarchical Document Structure Analysis, HDSA)专门用于恢复使用具有层次模式的创作软件创建的文档的层次结构。先前的研究主要遵循两种方法:一种方法侧重于单独处理HDSA的特定子任务,如表检测或读取顺序预测,而另一种方法采用统一的框架,使用多个分支或模块,每个分支或模块设计用于解决不同的任务。在这项工作中,我们提出了一种统一的HDSA关系预测方法,称为UniHDSA,它将各种HDSA子任务视为关系预测问题,并将关系预测标签整合到统一的标签空间中。这允许单个关系预测模块同时处理多个任务,无论是页面级还是文档级结构分析。通过这样做,我们的方法显著降低了级联错误的风险,并提高了系统的效率、可伸缩性和适应性。为了验证UniHDSA的有效性,我们开发了一个基于Transformer架构的多模态端到端系统。大量的实验结果表明,我们的方法在分层文档结构分析基准Comp-HRDoc上取得了最先进的性能,并在大规模文档布局分析数据集DocLayNet上取得了竞争结果,有效地说明了我们的方法在所有子任务上的优越性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Pattern Recognition
Pattern Recognition 工程技术-工程:电子与电气
CiteScore
14.40
自引率
16.20%
发文量
683
审稿时长
5.6 months
期刊介绍: The field of Pattern Recognition is both mature and rapidly evolving, playing a crucial role in various related fields such as computer vision, image processing, text analysis, and neural networks. It closely intersects with machine learning and is being applied in emerging areas like biometrics, bioinformatics, multimedia data analysis, and data science. The journal Pattern Recognition, established half a century ago during the early days of computer science, has since grown significantly in scope and influence.
期刊最新文献
IrisMAE: Structure-aware masked image modeling for iris recognition Minimizing the pretraining gap: Domain-aligned text-based person retrieval Stealthy backdoor attack method targeting group fairness in self-supervised learning Single-domain generalization for fastener detection via sample reconstruction and class-wise domain contrast EdgeFusionNet: Edge information-guided small object detection for remote sensing images
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1