使用深度学习的视觉问答

2021 Innovations in Power and Advanced Computing Technologies (i-PACT) Pub Date : 2021-11-27 DOI:10.1109/i-PACT52855.2021.9696665

Pallavi, Sonali, Tanuritha, Vidhya, Prof. Anjini

{"title":"使用深度学习的视觉问答","authors":"Pallavi, Sonali, Tanuritha, Vidhya, Prof. Anjini","doi":"10.1109/i-PACT52855.2021.9696665","DOIUrl":null,"url":null,"abstract":"Visual Question Answering (VQA) in recent times challenges fields that have received an outsized interest from the areas of Natural Language Processing and Computer Vision. VQA aims to establish an intelligent system to predict the answers for the natural language questions raised related to the image. The questions about the abstract or real word images are appealed to the VQA system; The system understands the image, and questions using Natural Language Processing (NLP) and Computer Vision which aims to predict the answer in natural language. The main issues which affect the performance of the VQA system is the inability to deal with the open-ended question acquired from the user. The proposed system is developed with a Graphical User Interface (GUI) that extracts the image features using pretrained VGG 16, and Golve embedding and Long Short- Term Memory (LSTM) are used in order to extract question features. By merging the characteristics of the images and the questions using pointwise multiplication the ultimate result is obtained. The acquired result is passed through a softmax layer to find the top 5 predictions about the image question. The proposed system has been experimented with various open-ended questions to show the robustness of the system. VQA finds its application in various real-world scenarios such as self-driving cars and guiding visually impaired people. Visual questions aim different parts of an image, including underlying context and background details.","PeriodicalId":335956,"journal":{"name":"2021 Innovations in Power and Advanced Computing Technologies (i-PACT)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Visual Question Answering Using Deep Learning\",\"authors\":\"Pallavi, Sonali, Tanuritha, Vidhya, Prof. Anjini\",\"doi\":\"10.1109/i-PACT52855.2021.9696665\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Visual Question Answering (VQA) in recent times challenges fields that have received an outsized interest from the areas of Natural Language Processing and Computer Vision. VQA aims to establish an intelligent system to predict the answers for the natural language questions raised related to the image. The questions about the abstract or real word images are appealed to the VQA system; The system understands the image, and questions using Natural Language Processing (NLP) and Computer Vision which aims to predict the answer in natural language. The main issues which affect the performance of the VQA system is the inability to deal with the open-ended question acquired from the user. The proposed system is developed with a Graphical User Interface (GUI) that extracts the image features using pretrained VGG 16, and Golve embedding and Long Short- Term Memory (LSTM) are used in order to extract question features. By merging the characteristics of the images and the questions using pointwise multiplication the ultimate result is obtained. The acquired result is passed through a softmax layer to find the top 5 predictions about the image question. The proposed system has been experimented with various open-ended questions to show the robustness of the system. VQA finds its application in various real-world scenarios such as self-driving cars and guiding visually impaired people. Visual questions aim different parts of an image, including underlying context and background details.\",\"PeriodicalId\":335956,\"journal\":{\"name\":\"2021 Innovations in Power and Advanced Computing Technologies (i-PACT)\",\"volume\":\"25 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-11-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 Innovations in Power and Advanced Computing Technologies (i-PACT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/i-PACT52855.2021.9696665\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 Innovations in Power and Advanced Computing Technologies (i-PACT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/i-PACT52855.2021.9696665","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

近年来，视觉问答(VQA)挑战了自然语言处理和计算机视觉领域的巨大兴趣。VQA旨在建立一个智能系统来预测与图像相关的自然语言问题的答案。关于抽象或真实世界图像的问题诉诸于VQA系统;该系统使用自然语言处理(NLP)和计算机视觉来理解图像和问题，目的是用自然语言预测答案。影响VQA系统性能的主要问题是无法处理从用户那里获得的开放式问题。该系统采用图形用户界面(GUI)，使用预训练的VGG - 16提取图像特征，并使用Golve嵌入和长短期记忆(LSTM)提取问题特征。利用点乘法将图像特征与问题进行融合，得到最终结果。将获得的结果通过softmax层来找到关于图像问题的前5个预测。所提出的系统已经用各种开放式问题进行了实验，以显示系统的鲁棒性。VQA在自动驾驶汽车和视障人士的指导等各种现实场景中得到了应用。视觉问题针对图像的不同部分，包括潜在的上下文和背景细节。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Visual Question Answering Using Deep Learning

Visual Question Answering (VQA) in recent times challenges fields that have received an outsized interest from the areas of Natural Language Processing and Computer Vision. VQA aims to establish an intelligent system to predict the answers for the natural language questions raised related to the image. The questions about the abstract or real word images are appealed to the VQA system; The system understands the image, and questions using Natural Language Processing (NLP) and Computer Vision which aims to predict the answer in natural language. The main issues which affect the performance of the VQA system is the inability to deal with the open-ended question acquired from the user. The proposed system is developed with a Graphical User Interface (GUI) that extracts the image features using pretrained VGG 16, and Golve embedding and Long Short- Term Memory (LSTM) are used in order to extract question features. By merging the characteristics of the images and the questions using pointwise multiplication the ultimate result is obtained. The acquired result is passed through a softmax layer to find the top 5 predictions about the image question. The proposed system has been experimented with various open-ended questions to show the robustness of the system. VQA finds its application in various real-world scenarios such as self-driving cars and guiding visually impaired people. Visual questions aim different parts of an image, including underlying context and background details.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 Innovations in Power and Advanced Computing Technologies (i-PACT)

自引率

0.00%

发文量