UniHDSA: A unified relation prediction approach for hierarchical document structure analysis

IF 7.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pattern Recognition Pub Date : 2025-09-01 Epub Date: 2025-03-27 DOI:10.1016/j.patcog.2025.111617

Jiawei Wang , Kai Hu , Qiang Huo

{"title":"UniHDSA: A unified relation prediction approach for hierarchical document structure analysis","authors":"Jiawei Wang , Kai Hu , Qiang Huo","doi":"10.1016/j.patcog.2025.111617","DOIUrl":null,"url":null,"abstract":"<div><div>Document structure analysis, aka document layout analysis, is crucial for understanding both the physical layout and logical structure of documents, serving information retrieval, document summarization, knowledge extraction, etc. Hierarchical Document Structure Analysis (HDSA) specifically aims to restore the hierarchical structure of documents created using authoring software with hierarchical schemas. Previous research has primarily followed two approaches: one focuses on tackling specific subtasks of HDSA in isolation, such as table detection or reading order prediction, while the other adopts a unified framework that uses multiple branches or modules, each designed to address a distinct task. In this work, we propose a unified relation prediction approach for HDSA, called UniHDSA, which treats various HDSA sub-tasks as relation prediction problems and consolidates relation prediction labels into a unified label space. This allows a single relation prediction module to handle multiple tasks simultaneously, whether at a page-level or document-level structure analysis. By doing so, our approach significantly reduces the risk of cascading errors and enhances system’s efficiency, scalability, and adaptability. To validate the effectiveness of UniHDSA, we develop a multimodal end-to-end system based on Transformer architectures. Extensive experimental results demonstrate that our approach achieves state-of-the-art performance on a hierarchical document structure analysis benchmark, Comp-HRDoc, and competitive results on a large-scale document layout analysis dataset, DocLayNet, effectively illustrating the superiority of our method across all sub-tasks.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"165 ","pages":"Article 111617"},"PeriodicalIF":7.6000,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0031320325002778","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/3/27 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Document structure analysis, aka document layout analysis, is crucial for understanding both the physical layout and logical structure of documents, serving information retrieval, document summarization, knowledge extraction, etc. Hierarchical Document Structure Analysis (HDSA) specifically aims to restore the hierarchical structure of documents created using authoring software with hierarchical schemas. Previous research has primarily followed two approaches: one focuses on tackling specific subtasks of HDSA in isolation, such as table detection or reading order prediction, while the other adopts a unified framework that uses multiple branches or modules, each designed to address a distinct task. In this work, we propose a unified relation prediction approach for HDSA, called UniHDSA, which treats various HDSA sub-tasks as relation prediction problems and consolidates relation prediction labels into a unified label space. This allows a single relation prediction module to handle multiple tasks simultaneously, whether at a page-level or document-level structure analysis. By doing so, our approach significantly reduces the risk of cascading errors and enhances system’s efficiency, scalability, and adaptability. To validate the effectiveness of UniHDSA, we develop a multimodal end-to-end system based on Transformer architectures. Extensive experimental results demonstrate that our approach achieves state-of-the-art performance on a hierarchical document structure analysis benchmark, Comp-HRDoc, and competitive results on a large-scale document layout analysis dataset, DocLayNet, effectively illustrating the superiority of our method across all sub-tasks.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

unhdsa：用于分层文档结构分析的统一关系预测方法

文档结构分析，又称文档布局分析，对于理解文档的物理布局和逻辑结构，为信息检索、文档总结、知识提取等服务至关重要。层次文档结构分析（Hierarchical Document Structure Analysis， HDSA）专门用于恢复使用具有层次模式的创作软件创建的文档的层次结构。先前的研究主要遵循两种方法：一种方法侧重于单独处理HDSA的特定子任务，如表检测或读取顺序预测，而另一种方法采用统一的框架，使用多个分支或模块，每个分支或模块设计用于解决不同的任务。在这项工作中，我们提出了一种统一的HDSA关系预测方法，称为UniHDSA，它将各种HDSA子任务视为关系预测问题，并将关系预测标签整合到统一的标签空间中。这允许单个关系预测模块同时处理多个任务，无论是页面级还是文档级结构分析。通过这样做，我们的方法显著降低了级联错误的风险，并提高了系统的效率、可伸缩性和适应性。为了验证UniHDSA的有效性，我们开发了一个基于Transformer架构的多模态端到端系统。大量的实验结果表明，我们的方法在分层文档结构分析基准Comp-HRDoc上取得了最先进的性能，并在大规模文档布局分析数据集DocLayNet上取得了竞争结果，有效地说明了我们的方法在所有子任务上的优越性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Pattern Recognition 工程技术-工程：电子与电气

CiteScore

14.40

自引率

16.20%

发文量

683

审稿时长

5.6 months

期刊介绍： The field of Pattern Recognition is both mature and rapidly evolving, playing a crucial role in various related fields such as computer vision, image processing, text analysis, and neural networks. It closely intersects with machine learning and is being applied in emerging areas like biometrics, bioinformatics, multimedia data analysis, and data science. The journal Pattern Recognition, established half a century ago during the early days of computer science, has since grown significantly in scope and influence.