The PAGE (Page Analysis and Ground-Truth Elements) Format Framework

S. Pletschacher, A. Antonacopoulos
{"title":"The PAGE (Page Analysis and Ground-Truth Elements) Format Framework","authors":"S. Pletschacher, A. Antonacopoulos","doi":"10.1109/ICPR.2010.72","DOIUrl":null,"url":null,"abstract":"There is a plethora of established and proposed document representation formats but none that can adequately support individual stages within an entire sequence of document image analysis methods (from document image enhancement to layout analysis to OCR) and their evaluation. This paper describes PAGE, a new XML-based page image representation framework that records information on image characteristics (image borders, geometric distortions and corresponding corrections, binarisation etc.) in addition to layout structure and page content. The suitability of the framework to the evaluation of entire workflows as well as individual stages has been extensively validated by using it in high-profile applications such as in public contemporary and historical ground-truthed datasets and in the ICDAR Page Segmentation competition series.","PeriodicalId":309591,"journal":{"name":"2010 20th International Conference on Pattern Recognition","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"146","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 20th International Conference on Pattern Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPR.2010.72","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 146

Abstract

There is a plethora of established and proposed document representation formats but none that can adequately support individual stages within an entire sequence of document image analysis methods (from document image enhancement to layout analysis to OCR) and their evaluation. This paper describes PAGE, a new XML-based page image representation framework that records information on image characteristics (image borders, geometric distortions and corresponding corrections, binarisation etc.) in addition to layout structure and page content. The suitability of the framework to the evaluation of entire workflows as well as individual stages has been extensively validated by using it in high-profile applications such as in public contemporary and historical ground-truthed datasets and in the ICDAR Page Segmentation competition series.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
PAGE(页面分析和基本事实元素)格式框架
已经建立和建议的文档表示格式太多了,但是没有一种格式能够充分支持整个文档图像分析方法序列中的各个阶段(从文档图像增强到布局分析再到OCR)及其评估。本文描述了PAGE,一个新的基于xml的页面图像表示框架,它记录了除了布局结构和页面内容之外的图像特征信息(图像边界、几何扭曲和相应的校正、二值化等)。该框架对整个工作流程以及各个阶段的评估的适用性已经通过在诸如公共当代和历史地面真实数据集以及ICDAR页面分割竞争系列等高调应用中使用它进行了广泛验证。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A Comprehensive Evaluation on Non-deterministic Motion Estimation Coarse Scale Feature Extraction Using the Spiral Architecture Structure Research the Performance of a Recursive Algorithm of the Local Discrete Wavelet Transform Underwater Mine Classification with Imperfect Labels Scribe Identification in Medieval English Manuscripts
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1