{"title":"Handling language prior and compositional reasoning issues in Visual Question Answering system","authors":"Souvik Chowdhury, Badal Soni","doi":"10.1016/j.neucom.2025.129906","DOIUrl":null,"url":null,"abstract":"<div><div>Visual Question Answering (VQA) models often suffer from language bias, favoring common but incorrect answers, and struggle with compositional reasoning in complex queries. This paper proposes a unified approach using a multimodal large language model enhanced with adaptive prompts designed for specific tasks. Our method directly addresses these issues by reducing language bias and improving compositional reasoning. Extensive evaluations on benchmark datasets, including VQA v2.0, VQACP, TDIUC, GQA, Visual7 W, TextVQA, and STVQA show that our approach outperforms state-of-the-art models, achieving accuracy improvements of 8% to 9%. These results demonstrate the effectiveness of our method in enhancing VQA accuracy, making it a significant advancement for more reliable and robust applications in real-world scenarios.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"635 ","pages":"Article 129906"},"PeriodicalIF":5.5000,"publicationDate":"2025-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0925231225005788","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Visual Question Answering (VQA) models often suffer from language bias, favoring common but incorrect answers, and struggle with compositional reasoning in complex queries. This paper proposes a unified approach using a multimodal large language model enhanced with adaptive prompts designed for specific tasks. Our method directly addresses these issues by reducing language bias and improving compositional reasoning. Extensive evaluations on benchmark datasets, including VQA v2.0, VQACP, TDIUC, GQA, Visual7 W, TextVQA, and STVQA show that our approach outperforms state-of-the-art models, achieving accuracy improvements of 8% to 9%. These results demonstrate the effectiveness of our method in enhancing VQA accuracy, making it a significant advancement for more reliable and robust applications in real-world scenarios.
期刊介绍:
Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.