Natural language guided object retrieval in images

IF 0.4 4区 计算机科学 Q4 COMPUTER SCIENCE, INFORMATION SYSTEMS Acta Informatica Pub Date : 2021-07-19 DOI:10.1007/s00236-021-00400-2
Ahmad Ostovar, Suna Bensch, Thomas Hellström
{"title":"Natural language guided object retrieval in images","authors":"Ahmad Ostovar,&nbsp;Suna Bensch,&nbsp;Thomas Hellström","doi":"10.1007/s00236-021-00400-2","DOIUrl":null,"url":null,"abstract":"<div><p>The ability to understand the surrounding environment and being able to communicate with interacting humans are important functionalities for many automated systems where visual input (e.g., images, video) and natural language input (speech or text) have to be related to each other. Possible applications are automatic image caption generation, interactive surveillance systems, or human robot interaction. In this paper, we propose algorithms for automatic responses to natural language queries about an image. Our approach uses a predefined neural net for detection of bounding boxes and objects in images, spatial relations between bounding boxes are modeled with a neural net, the queries are analyzed with a syntactic parser, and algorithms to map natural language to properties in the images are introduced. The algorithms make use of semantic similarity and antonyms. We evaluate the performance of our approach with test users assessing the quality of our system’s generated answers.</p></div>","PeriodicalId":7189,"journal":{"name":"Acta Informatica","volume":null,"pages":null},"PeriodicalIF":0.4000,"publicationDate":"2021-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1007/s00236-021-00400-2","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Acta Informatica","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s00236-021-00400-2","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 1

Abstract

The ability to understand the surrounding environment and being able to communicate with interacting humans are important functionalities for many automated systems where visual input (e.g., images, video) and natural language input (speech or text) have to be related to each other. Possible applications are automatic image caption generation, interactive surveillance systems, or human robot interaction. In this paper, we propose algorithms for automatic responses to natural language queries about an image. Our approach uses a predefined neural net for detection of bounding boxes and objects in images, spatial relations between bounding boxes are modeled with a neural net, the queries are analyzed with a syntactic parser, and algorithms to map natural language to properties in the images are introduced. The algorithms make use of semantic similarity and antonyms. We evaluate the performance of our approach with test users assessing the quality of our system’s generated answers.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
自然语言引导下的图像对象检索
理解周围环境的能力和能够与互动的人类进行交流的能力是许多自动化系统的重要功能,其中视觉输入(例如,图像,视频)和自然语言输入(语音或文本)必须相互关联。可能的应用是自动图像标题生成、交互式监视系统或人机交互。在本文中,我们提出了自动响应关于图像的自然语言查询的算法。我们的方法使用预定义的神经网络来检测图像中的边界框和物体,用神经网络建模边界框之间的空间关系,用语法解析器分析查询,并引入将自然语言映射到图像中的属性的算法。该算法利用了语义相似和反义词。我们通过测试用户评估系统生成答案的质量来评估我们方法的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Acta Informatica
Acta Informatica 工程技术-计算机:信息系统
CiteScore
2.40
自引率
16.70%
发文量
24
审稿时长
>12 weeks
期刊介绍: Acta Informatica provides international dissemination of articles on formal methods for the design and analysis of programs, computing systems and information structures, as well as related fields of Theoretical Computer Science such as Automata Theory, Logic in Computer Science, and Algorithmics. Topics of interest include: • semantics of programming languages • models and modeling languages for concurrent, distributed, reactive and mobile systems • models and modeling languages for timed, hybrid and probabilistic systems • specification, program analysis and verification • model checking and theorem proving • modal, temporal, first- and higher-order logics, and their variants • constraint logic, SAT/SMT-solving techniques • theoretical aspects of databases, semi-structured data and finite model theory • theoretical aspects of artificial intelligence, knowledge representation, description logic • automata theory, formal languages, term and graph rewriting • game-based models, synthesis • type theory, typed calculi • algebraic, coalgebraic and categorical methods • formal aspects of performance, dependability and reliability analysis • foundations of information and network security • parallel, distributed and randomized algorithms • design and analysis of algorithms • foundations of network and communication protocols.
期刊最新文献
Serial and parallel algorithms for order-preserving pattern matching based on the duel-and-sweep paradigm Linear-size suffix tries and linear-size CDAWGs simplified and improved Parameterized aspects of distinct kemeny rank aggregation Word-representable graphs from a word’s perspective A closer look at Hamiltonicity and domination through the lens of diameter and convexity
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1