Parsing and Summarizing Infographics with Synthetically Trained Icon Detection

Spandan Madan, Z. Bylinskii, C. Nobre, Matthew Tancik, Adrià Recasens, Kimberli Zhong, Sami Alsheikh, A. Oliva, F. Durand, H. Pfister
{"title":"Parsing and Summarizing Infographics with Synthetically Trained Icon Detection","authors":"Spandan Madan, Z. Bylinskii, C. Nobre, Matthew Tancik, Adrià Recasens, Kimberli Zhong, Sami Alsheikh, A. Oliva, F. Durand, H. Pfister","doi":"10.1109/PacificVis52677.2021.00012","DOIUrl":null,"url":null,"abstract":"Widely used in news, business, and educational media, infographics are handcrafted to effectively communicate messages about complex and often abstract topics including ‘ways to conserve the environment’ and ‘coronavirus prevention’. The computational understanding of infographics required for future applications like automatic captioning, summarization, search, and question-answering, will depend on being able to parse the visual and textual elements contained within. However, being composed of stylistically and semantically diverse visual and textual elements, infographics pose challenges for current A.I. systems. While automatic text extraction works reasonably well on infographics, standard object detection algorithms fail to identify the stand-alone visual elements in infographics that we refer to as ‘icons’. In this paper, we propose a novel approach to train an object detector using synthetically-generated data, and show that it succeeds at generalizing to detecting icons within in-the-wild infographics. We further pair our icon detection approach with an icon classifier and a state-of-the-art text detector to demonstrate three demo applications: topic prediction, multi-modal summarization, and multi-modal search. Parsing the visual and textual elements within infographics provides us with the first steps towards automatic infographic understanding.","PeriodicalId":199565,"journal":{"name":"2021 IEEE 14th Pacific Visualization Symposium (PacificVis)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 14th Pacific Visualization Symposium (PacificVis)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PacificVis52677.2021.00012","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Widely used in news, business, and educational media, infographics are handcrafted to effectively communicate messages about complex and often abstract topics including ‘ways to conserve the environment’ and ‘coronavirus prevention’. The computational understanding of infographics required for future applications like automatic captioning, summarization, search, and question-answering, will depend on being able to parse the visual and textual elements contained within. However, being composed of stylistically and semantically diverse visual and textual elements, infographics pose challenges for current A.I. systems. While automatic text extraction works reasonably well on infographics, standard object detection algorithms fail to identify the stand-alone visual elements in infographics that we refer to as ‘icons’. In this paper, we propose a novel approach to train an object detector using synthetically-generated data, and show that it succeeds at generalizing to detecting icons within in-the-wild infographics. We further pair our icon detection approach with an icon classifier and a state-of-the-art text detector to demonstrate three demo applications: topic prediction, multi-modal summarization, and multi-modal search. Parsing the visual and textual elements within infographics provides us with the first steps towards automatic infographic understanding.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
用综合训练的图标检测分析和总结信息图
信息图表广泛用于新闻、商业和教育媒体,是手工制作的,用于有效传达复杂且通常是抽象主题的信息,包括“保护环境的方法”和“冠状病毒预防”。未来应用程序(如自动字幕、摘要、搜索和问答)所需的对信息图的计算理解将取决于能够解析其中包含的视觉和文本元素。然而,信息图表由风格和语义上不同的视觉和文本元素组成,对当前的人工智能系统构成了挑战。虽然自动文本提取在信息图上工作得相当好,但标准对象检测算法无法识别信息图中我们称之为“图标”的独立视觉元素。在本文中,我们提出了一种使用合成生成的数据来训练目标检测器的新方法,并表明它成功地推广到检测野外信息图中的图标。我们进一步将我们的图标检测方法与图标分类器和最先进的文本检测器配对,以演示三个演示应用:主题预测、多模态摘要和多模态搜索。解析信息图中的视觉和文本元素为我们自动理解信息图提供了第一步。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
ADVISor: Automatic Visualization Answer for Natural-Language Question on Tabular Data An Extension of Empirical Orthogonal Functions for the Analysis of Time-Dependent 2D Scalar Field Ensembles Know-What and Know-Who: Document Searching and Exploration using Topic-Based Two-Mode Networks Louvain-based Multi-level Graph Drawing A Visual Analytics Approach for the Diagnosis of Heterogeneous and Multidimensional Machine Maintenance Data
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1