{"title":"使用深度学习的视觉问答","authors":"Pallavi, Sonali, Tanuritha, Vidhya, Prof. Anjini","doi":"10.1109/i-PACT52855.2021.9696665","DOIUrl":null,"url":null,"abstract":"Visual Question Answering (VQA) in recent times challenges fields that have received an outsized interest from the areas of Natural Language Processing and Computer Vision. VQA aims to establish an intelligent system to predict the answers for the natural language questions raised related to the image. The questions about the abstract or real word images are appealed to the VQA system; The system understands the image, and questions using Natural Language Processing (NLP) and Computer Vision which aims to predict the answer in natural language. The main issues which affect the performance of the VQA system is the inability to deal with the open-ended question acquired from the user. The proposed system is developed with a Graphical User Interface (GUI) that extracts the image features using pretrained VGG 16, and Golve embedding and Long Short- Term Memory (LSTM) are used in order to extract question features. By merging the characteristics of the images and the questions using pointwise multiplication the ultimate result is obtained. The acquired result is passed through a softmax layer to find the top 5 predictions about the image question. The proposed system has been experimented with various open-ended questions to show the robustness of the system. VQA finds its application in various real-world scenarios such as self-driving cars and guiding visually impaired people. Visual questions aim different parts of an image, including underlying context and background details.","PeriodicalId":335956,"journal":{"name":"2021 Innovations in Power and Advanced Computing Technologies (i-PACT)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Visual Question Answering Using Deep Learning\",\"authors\":\"Pallavi, Sonali, Tanuritha, Vidhya, Prof. Anjini\",\"doi\":\"10.1109/i-PACT52855.2021.9696665\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Visual Question Answering (VQA) in recent times challenges fields that have received an outsized interest from the areas of Natural Language Processing and Computer Vision. VQA aims to establish an intelligent system to predict the answers for the natural language questions raised related to the image. The questions about the abstract or real word images are appealed to the VQA system; The system understands the image, and questions using Natural Language Processing (NLP) and Computer Vision which aims to predict the answer in natural language. The main issues which affect the performance of the VQA system is the inability to deal with the open-ended question acquired from the user. The proposed system is developed with a Graphical User Interface (GUI) that extracts the image features using pretrained VGG 16, and Golve embedding and Long Short- Term Memory (LSTM) are used in order to extract question features. By merging the characteristics of the images and the questions using pointwise multiplication the ultimate result is obtained. The acquired result is passed through a softmax layer to find the top 5 predictions about the image question. The proposed system has been experimented with various open-ended questions to show the robustness of the system. VQA finds its application in various real-world scenarios such as self-driving cars and guiding visually impaired people. Visual questions aim different parts of an image, including underlying context and background details.\",\"PeriodicalId\":335956,\"journal\":{\"name\":\"2021 Innovations in Power and Advanced Computing Technologies (i-PACT)\",\"volume\":\"25 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-11-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 Innovations in Power and Advanced Computing Technologies (i-PACT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/i-PACT52855.2021.9696665\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 Innovations in Power and Advanced Computing Technologies (i-PACT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/i-PACT52855.2021.9696665","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Visual Question Answering (VQA) in recent times challenges fields that have received an outsized interest from the areas of Natural Language Processing and Computer Vision. VQA aims to establish an intelligent system to predict the answers for the natural language questions raised related to the image. The questions about the abstract or real word images are appealed to the VQA system; The system understands the image, and questions using Natural Language Processing (NLP) and Computer Vision which aims to predict the answer in natural language. The main issues which affect the performance of the VQA system is the inability to deal with the open-ended question acquired from the user. The proposed system is developed with a Graphical User Interface (GUI) that extracts the image features using pretrained VGG 16, and Golve embedding and Long Short- Term Memory (LSTM) are used in order to extract question features. By merging the characteristics of the images and the questions using pointwise multiplication the ultimate result is obtained. The acquired result is passed through a softmax layer to find the top 5 predictions about the image question. The proposed system has been experimented with various open-ended questions to show the robustness of the system. VQA finds its application in various real-world scenarios such as self-driving cars and guiding visually impaired people. Visual questions aim different parts of an image, including underlying context and background details.