Identifying the Central Figure of a Scientific Paper

Sean T. Yang, Po-Shen Lee, L. Kazakova, Abhishek Joshi, B. M. Oh, Jevin D. West, B. Howe
{"title":"Identifying the Central Figure of a Scientific Paper","authors":"Sean T. Yang, Po-Shen Lee, L. Kazakova, Abhishek Joshi, B. M. Oh, Jevin D. West, B. Howe","doi":"10.1109/ICDAR.2019.00173","DOIUrl":null,"url":null,"abstract":"Publishers are increasingly using graphical abstracts to facilitate scientific search, especially across disciplinary boundaries. They are presented on various media, easily shared and information rich. However, very small amount of scientific publications are equipped with graphical abstracts. What can we do with the vast majority of papers with no selected graphical abstract? In this paper, we first hypothesize that scientific papers actually include a \"central figure\" that serve as a graphical abstract. These figures convey the key results and provide a visual identity for the paper. Using survey data collected from 6,263 authors regarding 8,353 papers over 15 years, we find that over 87% of papers are considered to contain a central figure, and that these central figures are primarily used to summarize important results, explain the key methods, or provide additional discussion. We then train a model to automatically recognize the central figure, achieving top-3 accuracy of 78% and exact match accuracy of 34%. We find that the primary boost in accuracy comes from figure captions that resemble the abstract. We make all our data and results publicly available at https://github.com/viziometrics/centraul_figure. Our goal is to automate central figure identification to improve search engine performance and to help scientists connect ideas across the literature.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 International Conference on Document Analysis and Recognition (ICDAR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDAR.2019.00173","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

Publishers are increasingly using graphical abstracts to facilitate scientific search, especially across disciplinary boundaries. They are presented on various media, easily shared and information rich. However, very small amount of scientific publications are equipped with graphical abstracts. What can we do with the vast majority of papers with no selected graphical abstract? In this paper, we first hypothesize that scientific papers actually include a "central figure" that serve as a graphical abstract. These figures convey the key results and provide a visual identity for the paper. Using survey data collected from 6,263 authors regarding 8,353 papers over 15 years, we find that over 87% of papers are considered to contain a central figure, and that these central figures are primarily used to summarize important results, explain the key methods, or provide additional discussion. We then train a model to automatically recognize the central figure, achieving top-3 accuracy of 78% and exact match accuracy of 34%. We find that the primary boost in accuracy comes from figure captions that resemble the abstract. We make all our data and results publicly available at https://github.com/viziometrics/centraul_figure. Our goal is to automate central figure identification to improve search engine performance and to help scientists connect ideas across the literature.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
识别科学论文的中心图形
出版商越来越多地使用图形摘要来促进科学搜索,特别是跨学科的搜索。它们呈现在各种媒体上,易于共享且信息丰富。然而,只有极少数的科学出版物配有图形摘要。对于绝大多数没有精选图形摘要的论文,我们该怎么办?在本文中,我们首先假设科学论文实际上包括一个作为图形抽象的“中心人物”。这些数字传达了关键的结果,并为论文提供了视觉识别。通过对15年来8653篇论文的6263位作者的调查数据,我们发现超过87%的论文被认为包含中心人物,这些中心人物主要用于总结重要结果,解释关键方法,或提供额外的讨论。然后,我们训练了一个模型来自动识别中心图形,达到了78%的前3名准确率和34%的精确匹配准确率。我们发现,准确度的主要提升来自于类似于摘要的图片标题。我们在https://github.com/viziometrics/centraul_figure上公开了所有的数据和结果。我们的目标是自动化中心图形识别,以提高搜索引擎的性能,并帮助科学家在文献中连接思想。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Article Segmentation in Digitised Newspapers with a 2D Markov Model ICDAR 2019 Robust Reading Challenge on Reading Chinese Text on Signboard TableNet: Deep Learning Model for End-to-end Table Detection and Tabular Data Extraction from Scanned Document Images DICE: Deep Intelligent Contextual Embedding for Twitter Sentiment Analysis Blind Source Separation Based Framework for Multispectral Document Images Binarization
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1