Incorporating Verb Semantic Information in Visual Question Answering Through Multitask Learning Paradigm

Mehrdad Alizadeh, Barbara Maria Di Eugenio
{"title":"Incorporating Verb Semantic Information in Visual Question Answering Through Multitask Learning Paradigm","authors":"Mehrdad Alizadeh, Barbara Maria Di Eugenio","doi":"10.1142/S1793351X20400085","DOIUrl":null,"url":null,"abstract":"Visual Question Answering (VQA) concerns providing answers to Natural Language questions about images. Several deep neural network approaches have been proposed to model the task in an end-to-end fashion. Whereas the task is grounded in visual processing, if the question focuses on events described by verbs, the language understanding component becomes crucial. Our hypothesis is that models should be aware of verb semantics, as expressed via semantic role labels, argument types, and/or frame elements. Unfortunately, no VQA dataset exists that includes verb semantic information. Our first contribution is a new VQA dataset (imSituVQA) that we built by taking advantage of the imSitu annotations. The imSitu dataset consists of images manually labeled with semantic frame elements, mostly taken from FrameNet. Second, we propose a multi-task CNN-LSTM VQA model that learns to classify the answers as well as the semantic frame elements. Our experiments show that semantic frame element classification helps the VQA system avoid inconsistent responses and improves performance. Third, we employ an automatic semantic role labeler and annotate a subset of the VQA dataset (VQAsub). This way, the proposed multi-task CNN-LSTM VQA model can be trained with the VQAsub as well. The results show a slight improvement over the single-task CNN-LSTM model.","PeriodicalId":217956,"journal":{"name":"Int. J. Semantic Comput.","volume":"50 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Int. J. Semantic Comput.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1142/S1793351X20400085","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Visual Question Answering (VQA) concerns providing answers to Natural Language questions about images. Several deep neural network approaches have been proposed to model the task in an end-to-end fashion. Whereas the task is grounded in visual processing, if the question focuses on events described by verbs, the language understanding component becomes crucial. Our hypothesis is that models should be aware of verb semantics, as expressed via semantic role labels, argument types, and/or frame elements. Unfortunately, no VQA dataset exists that includes verb semantic information. Our first contribution is a new VQA dataset (imSituVQA) that we built by taking advantage of the imSitu annotations. The imSitu dataset consists of images manually labeled with semantic frame elements, mostly taken from FrameNet. Second, we propose a multi-task CNN-LSTM VQA model that learns to classify the answers as well as the semantic frame elements. Our experiments show that semantic frame element classification helps the VQA system avoid inconsistent responses and improves performance. Third, we employ an automatic semantic role labeler and annotate a subset of the VQA dataset (VQAsub). This way, the proposed multi-task CNN-LSTM VQA model can be trained with the VQAsub as well. The results show a slight improvement over the single-task CNN-LSTM model.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
多任务学习范式在视觉问答中的动词语义信息整合
视觉问答(VQA)关注的是为关于图像的自然语言问题提供答案。已经提出了几种深度神经网络方法,以端到端方式对任务进行建模。然而,这个任务是基于视觉处理的,如果问题集中在动词描述的事件上,语言理解部分就变得至关重要。我们的假设是,模型应该知道动词语义,通过语义角色标签、参数类型和/或框架元素来表达。不幸的是,不存在包含动词语义信息的VQA数据集。我们的第一个贡献是利用imSitu注释构建了一个新的VQA数据集(imSituVQA)。imSitu数据集由手动标记语义框架元素的图像组成,这些图像大多来自FrameNet。其次,我们提出了一个多任务CNN-LSTM VQA模型,该模型学习对答案和语义框架元素进行分类。我们的实验表明,语义框架元素分类有助于VQA系统避免响应不一致,提高性能。第三,我们使用自动语义角色标记器并注释VQA数据集的子集(VQAsub)。这样,所提出的多任务CNN-LSTM VQA模型也可以用VQAsub进行训练。结果表明,与单任务CNN-LSTM模型相比,该模型略有改进。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Guest Editorial - Special Issue on IEEE AIKE 2022 TemporalDedup: Domain-Independent Deduplication of Redundant and Errant Temporal Data Knowledge Graph-Based Explainable Artificial Intelligence for Business Process Analysis Knowledge Graph-Based Integration of Autonomous Driving Datasets Confidence-Based Cheat Detection Through Constrained Order Inference of Temporal Sequences
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1