使用贝叶斯网络理解线形图

Rathin Radhakrishnan Nair, Nishant Sankaran, Ifeoma Nwogu, V. Govindaraju
{"title":"使用贝叶斯网络理解线形图","authors":"Rathin Radhakrishnan Nair, Nishant Sankaran, Ifeoma Nwogu, V. Govindaraju","doi":"10.1109/DAS.2016.73","DOIUrl":null,"url":null,"abstract":"Information graphics, such as bar charts, graphs, plots etc. in scientific documents primarily facilitate better understanding of information. Graphics are a key component in technical documents as they are simplified representations of complex ideas. When the traditional optical character recognition (OCR) systems is used on digitized documents, we lose the ideas conveyed in these information graphics since OCRs typically work only on text. And although in more recent times, tools have been developed to extract information graphics from pdf files, they still do not intelligently interpret the contents of the extracted graphics. We therefore propose a method for identifying the intended messages of line plots using a Bayesian network. We accomplish this by first extracting a dense set of points in from a line plot and then represent the entire line plot as a sequence of trends. We then implement a Bayesian network for reasoning about the messages conveyed by the line plots and their trends. We validate our approach by performing experiments on a dataset obtained from computer science conference publications and evaluate the performance of the network against the messages generated by human end users. The resulting intended message gives holistic information about the line plot(s) as well as lower level information about the trends that make up the plot.","PeriodicalId":197359,"journal":{"name":"2016 12th IAPR Workshop on Document Analysis Systems (DAS)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2016-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Understanding Line Plots Using Bayesian Network\",\"authors\":\"Rathin Radhakrishnan Nair, Nishant Sankaran, Ifeoma Nwogu, V. Govindaraju\",\"doi\":\"10.1109/DAS.2016.73\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Information graphics, such as bar charts, graphs, plots etc. in scientific documents primarily facilitate better understanding of information. Graphics are a key component in technical documents as they are simplified representations of complex ideas. When the traditional optical character recognition (OCR) systems is used on digitized documents, we lose the ideas conveyed in these information graphics since OCRs typically work only on text. And although in more recent times, tools have been developed to extract information graphics from pdf files, they still do not intelligently interpret the contents of the extracted graphics. We therefore propose a method for identifying the intended messages of line plots using a Bayesian network. We accomplish this by first extracting a dense set of points in from a line plot and then represent the entire line plot as a sequence of trends. We then implement a Bayesian network for reasoning about the messages conveyed by the line plots and their trends. We validate our approach by performing experiments on a dataset obtained from computer science conference publications and evaluate the performance of the network against the messages generated by human end users. The resulting intended message gives holistic information about the line plot(s) as well as lower level information about the trends that make up the plot.\",\"PeriodicalId\":197359,\"journal\":{\"name\":\"2016 12th IAPR Workshop on Document Analysis Systems (DAS)\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-04-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 12th IAPR Workshop on Document Analysis Systems (DAS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DAS.2016.73\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 12th IAPR Workshop on Document Analysis Systems (DAS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DAS.2016.73","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

摘要

科学文献中的信息图形,如条形图、图形、绘图等,主要有助于更好地理解信息。图形是技术文档中的关键组成部分,因为它们是复杂思想的简化表示。当传统的光学字符识别(OCR)系统用于数字化文档时,我们失去了这些信息图形所传达的思想,因为OCR通常只对文本起作用。尽管最近开发了从pdf文件中提取信息图形的工具,但它们仍然不能智能地解释所提取图形的内容。因此,我们提出了一种使用贝叶斯网络识别线形图的预期信息的方法。我们通过首先从线形图中提取密集的点集,然后将整个线形图表示为趋势序列来实现这一点。然后,我们实现了一个贝叶斯网络来推理线形图及其趋势所传达的信息。我们通过对从计算机科学会议出版物中获得的数据集进行实验来验证我们的方法,并根据人类最终用户生成的消息评估网络的性能。由此产生的预期信息提供了关于线形图的整体信息,以及关于构成该图的趋势的较低级别的信息。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Understanding Line Plots Using Bayesian Network
Information graphics, such as bar charts, graphs, plots etc. in scientific documents primarily facilitate better understanding of information. Graphics are a key component in technical documents as they are simplified representations of complex ideas. When the traditional optical character recognition (OCR) systems is used on digitized documents, we lose the ideas conveyed in these information graphics since OCRs typically work only on text. And although in more recent times, tools have been developed to extract information graphics from pdf files, they still do not intelligently interpret the contents of the extracted graphics. We therefore propose a method for identifying the intended messages of line plots using a Bayesian network. We accomplish this by first extracting a dense set of points in from a line plot and then represent the entire line plot as a sequence of trends. We then implement a Bayesian network for reasoning about the messages conveyed by the line plots and their trends. We validate our approach by performing experiments on a dataset obtained from computer science conference publications and evaluate the performance of the network against the messages generated by human end users. The resulting intended message gives holistic information about the line plot(s) as well as lower level information about the trends that make up the plot.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Handwritten and Machine-Printed Text Discrimination Using a Template Matching Approach General Pattern Run-Length Transform for Writer Identification Automatic Selection of Parameters for Document Image Enhancement Using Image Quality Assessment Large Scale Continuous Dating of Medieval Scribes Using a Combined Image and Language Model Performance of an Off-Line Signature Verification Method Based on Texture Features on a Large Indic-Script Signature Dataset
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1