{"title":"Investigation of Available Datasets and Techniques for Visual Question Answering","authors":"Lata A. Bhavnani, Dr. Narendra Patel","doi":"10.47164/ijngc.v14i3.767","DOIUrl":null,"url":null,"abstract":"Visual Question Answering (VQA) is an emerging AI research problem that combines computer vision, natural language processing, knowledge representation & reasoning (KR). Given image and question related to the image as input, it requires analysis of visual components of the image, type of question, and common sense or general knowledge to predict the right answer. VQA is useful in different real-time applications like blind person assistance, autonomous driving, solving trivial tasks like spotting empty tables in hotels, parks, or picnic places, etc. Since its introduction in 2014, many researchers have worked and applied different techniques for Visual question answering. Also, different datasets have been introduced. This paper presents an overview of available datasets and evaluation metrices used in the VQA area. Further paper presents different techniques used in the VQA domain. Techniques are categorized based on the mechanism used. Based on the detailed discussion and performance comparison we discuss various challenges in the VQA domain and provide directions for future work.","PeriodicalId":42021,"journal":{"name":"International Journal of Next-Generation Computing","volume":"418 1","pages":""},"PeriodicalIF":0.3000,"publicationDate":"2023-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Next-Generation Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.47164/ijngc.v14i3.767","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Visual Question Answering (VQA) is an emerging AI research problem that combines computer vision, natural language processing, knowledge representation & reasoning (KR). Given image and question related to the image as input, it requires analysis of visual components of the image, type of question, and common sense or general knowledge to predict the right answer. VQA is useful in different real-time applications like blind person assistance, autonomous driving, solving trivial tasks like spotting empty tables in hotels, parks, or picnic places, etc. Since its introduction in 2014, many researchers have worked and applied different techniques for Visual question answering. Also, different datasets have been introduced. This paper presents an overview of available datasets and evaluation metrices used in the VQA area. Further paper presents different techniques used in the VQA domain. Techniques are categorized based on the mechanism used. Based on the detailed discussion and performance comparison we discuss various challenges in the VQA domain and provide directions for future work.