A cascaded approach for page-object detection in scientific papers

Erika Spiteri Bailey, Alexandra Bonnici, Stefania Cristina
{"title":"A cascaded approach for page-object detection in scientific papers","authors":"Erika Spiteri Bailey, Alexandra Bonnici, Stefania Cristina","doi":"10.1145/3558100.3563851","DOIUrl":null,"url":null,"abstract":"In recent years, Page Object Detection (POD) has become a popular document understanding task, proving to be a non-trivial task given the potential complexity of documents. The rise of neural networks facilitated a more general learning approach to this task. However, in the literature, the different objects such as formulae, or figures among others, are generally considered individually. In this paper, we describe the joint localisation of six object classes relevant to scientific papers, namely isolated formulae, embedded formulae, figures, tables, variables and references. Through a qualitative analysis of these object classes, we note a hierarchy among the classes and propose a new localisation approach, using two, cascaded You Only Look Once (YOLO) networks. We also present a new data set consisting of labelled bounding boxes for all six object classes. This data set combines two commonly used data sets in the literature for formulae localisation, adding to the document images in these data sets the labels for figures, tables, variables and references. Using this data set, we achieve an average F1-score of 0.755 across all classes, which is comparable to the state-of-the-art for the object classes when considered individually for localisation.","PeriodicalId":146244,"journal":{"name":"Proceedings of the 22nd ACM Symposium on Document Engineering","volume":"79 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 22nd ACM Symposium on Document Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3558100.3563851","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

In recent years, Page Object Detection (POD) has become a popular document understanding task, proving to be a non-trivial task given the potential complexity of documents. The rise of neural networks facilitated a more general learning approach to this task. However, in the literature, the different objects such as formulae, or figures among others, are generally considered individually. In this paper, we describe the joint localisation of six object classes relevant to scientific papers, namely isolated formulae, embedded formulae, figures, tables, variables and references. Through a qualitative analysis of these object classes, we note a hierarchy among the classes and propose a new localisation approach, using two, cascaded You Only Look Once (YOLO) networks. We also present a new data set consisting of labelled bounding boxes for all six object classes. This data set combines two commonly used data sets in the literature for formulae localisation, adding to the document images in these data sets the labels for figures, tables, variables and references. Using this data set, we achieve an average F1-score of 0.755 across all classes, which is comparable to the state-of-the-art for the object classes when considered individually for localisation.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
科学论文中页面对象检测的级联方法
近年来,页面对象检测(POD)已成为一种流行的文档理解任务,鉴于文档的潜在复杂性,它被证明是一项重要的任务。神经网络的兴起促进了一种更通用的学习方法来完成这项任务。然而,在文献中,不同的对象,如公式或数字等,通常是单独考虑的。在本文中,我们描述了与科学论文相关的六类对象的联合定位,即孤立公式、嵌入公式、图形、表格、变量和参考文献。通过对这些对象类的定性分析,我们注意到类之间的层次结构,并提出了一种新的定位方法,使用两个级联的You Only Look Once (YOLO)网络。我们还提出了一个由所有六个对象类的标记边界框组成的新数据集。该数据集结合了文献中常用的两个数据集进行公式定位,在这些数据集中的文档图像上添加了图形、表格、变量和参考文献的标签。使用这个数据集,我们在所有类中获得了0.755的平均f1分数,当单独考虑对象类进行本地化时,这与对象类的最新水平相当。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
How did dennis ritchie produce his PhD thesis?: a typographical mystery From print to online newspapers on small displays: a layout generation approach aimed at preserving entry points Binarization of photographed documents image quality, processing time and size assessment Tab this folder of documents: page stream segmentation of business documents Graphical document representation for french newsletters analysis
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1