使用深度学习的视觉问答

Pallavi, Sonali, Tanuritha, Vidhya, Prof. Anjini
{"title":"使用深度学习的视觉问答","authors":"Pallavi, Sonali, Tanuritha, Vidhya, Prof. Anjini","doi":"10.1109/i-PACT52855.2021.9696665","DOIUrl":null,"url":null,"abstract":"Visual Question Answering (VQA) in recent times challenges fields that have received an outsized interest from the areas of Natural Language Processing and Computer Vision. VQA aims to establish an intelligent system to predict the answers for the natural language questions raised related to the image. The questions about the abstract or real word images are appealed to the VQA system; The system understands the image, and questions using Natural Language Processing (NLP) and Computer Vision which aims to predict the answer in natural language. The main issues which affect the performance of the VQA system is the inability to deal with the open-ended question acquired from the user. The proposed system is developed with a Graphical User Interface (GUI) that extracts the image features using pretrained VGG 16, and Golve embedding and Long Short- Term Memory (LSTM) are used in order to extract question features. By merging the characteristics of the images and the questions using pointwise multiplication the ultimate result is obtained. The acquired result is passed through a softmax layer to find the top 5 predictions about the image question. The proposed system has been experimented with various open-ended questions to show the robustness of the system. VQA finds its application in various real-world scenarios such as self-driving cars and guiding visually impaired people. Visual questions aim different parts of an image, including underlying context and background details.","PeriodicalId":335956,"journal":{"name":"2021 Innovations in Power and Advanced Computing Technologies (i-PACT)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Visual Question Answering Using Deep Learning\",\"authors\":\"Pallavi, Sonali, Tanuritha, Vidhya, Prof. Anjini\",\"doi\":\"10.1109/i-PACT52855.2021.9696665\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Visual Question Answering (VQA) in recent times challenges fields that have received an outsized interest from the areas of Natural Language Processing and Computer Vision. VQA aims to establish an intelligent system to predict the answers for the natural language questions raised related to the image. The questions about the abstract or real word images are appealed to the VQA system; The system understands the image, and questions using Natural Language Processing (NLP) and Computer Vision which aims to predict the answer in natural language. The main issues which affect the performance of the VQA system is the inability to deal with the open-ended question acquired from the user. The proposed system is developed with a Graphical User Interface (GUI) that extracts the image features using pretrained VGG 16, and Golve embedding and Long Short- Term Memory (LSTM) are used in order to extract question features. By merging the characteristics of the images and the questions using pointwise multiplication the ultimate result is obtained. The acquired result is passed through a softmax layer to find the top 5 predictions about the image question. The proposed system has been experimented with various open-ended questions to show the robustness of the system. VQA finds its application in various real-world scenarios such as self-driving cars and guiding visually impaired people. Visual questions aim different parts of an image, including underlying context and background details.\",\"PeriodicalId\":335956,\"journal\":{\"name\":\"2021 Innovations in Power and Advanced Computing Technologies (i-PACT)\",\"volume\":\"25 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-11-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 Innovations in Power and Advanced Computing Technologies (i-PACT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/i-PACT52855.2021.9696665\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 Innovations in Power and Advanced Computing Technologies (i-PACT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/i-PACT52855.2021.9696665","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

近年来,视觉问答(VQA)挑战了自然语言处理和计算机视觉领域的巨大兴趣。VQA旨在建立一个智能系统来预测与图像相关的自然语言问题的答案。关于抽象或真实世界图像的问题诉诸于VQA系统;该系统使用自然语言处理(NLP)和计算机视觉来理解图像和问题,目的是用自然语言预测答案。影响VQA系统性能的主要问题是无法处理从用户那里获得的开放式问题。该系统采用图形用户界面(GUI),使用预训练的VGG - 16提取图像特征,并使用Golve嵌入和长短期记忆(LSTM)提取问题特征。利用点乘法将图像特征与问题进行融合,得到最终结果。将获得的结果通过softmax层来找到关于图像问题的前5个预测。所提出的系统已经用各种开放式问题进行了实验,以显示系统的鲁棒性。VQA在自动驾驶汽车和视障人士的指导等各种现实场景中得到了应用。视觉问题针对图像的不同部分,包括潜在的上下文和背景细节。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Visual Question Answering Using Deep Learning
Visual Question Answering (VQA) in recent times challenges fields that have received an outsized interest from the areas of Natural Language Processing and Computer Vision. VQA aims to establish an intelligent system to predict the answers for the natural language questions raised related to the image. The questions about the abstract or real word images are appealed to the VQA system; The system understands the image, and questions using Natural Language Processing (NLP) and Computer Vision which aims to predict the answer in natural language. The main issues which affect the performance of the VQA system is the inability to deal with the open-ended question acquired from the user. The proposed system is developed with a Graphical User Interface (GUI) that extracts the image features using pretrained VGG 16, and Golve embedding and Long Short- Term Memory (LSTM) are used in order to extract question features. By merging the characteristics of the images and the questions using pointwise multiplication the ultimate result is obtained. The acquired result is passed through a softmax layer to find the top 5 predictions about the image question. The proposed system has been experimented with various open-ended questions to show the robustness of the system. VQA finds its application in various real-world scenarios such as self-driving cars and guiding visually impaired people. Visual questions aim different parts of an image, including underlying context and background details.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Abnormality Detection in Humerus Bone Radiographs Using DenseNet Random Optimal Search Based Significant Gene Identification and Classification of Disease Samples Co-Design Approach of Converter Control for Battery Charging Electric Vehicle Applications Typical Analysis of Different Natural Esters and their Performance: A Review Machine Learning-Based Medium Access Control Protocol for Heterogeneous Wireless Networks: A Review
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1