{"title":"Residual Attention Network: A new baseline model for visual question answering","authors":"Salma Louanas, Hichem Debbi","doi":"10.1109/ISIA55826.2022.9993583","DOIUrl":null,"url":null,"abstract":"Answering questions over images is a challenging task, it requires reasoning over both images and text. In this paper, we introduce Residual Attention Network(RAN), a new visual question answering model, and compare it with baseline models such as stacked attention model and CNN-LSTM model. We find that our model performs better than these baseline models. In addition to our model, we also evaluate several holistic models and compare them with neural module networks frameworks, and the results show that neural modules networks perform better in questions reasoning. All the experiments have been done on the CLEVER dataset, which is a recent VQA dataset for evaluating multiple-step reasoning VQA models.","PeriodicalId":169898,"journal":{"name":"2022 5th International Symposium on Informatics and its Applications (ISIA)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 5th International Symposium on Informatics and its Applications (ISIA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISIA55826.2022.9993583","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Answering questions over images is a challenging task, it requires reasoning over both images and text. In this paper, we introduce Residual Attention Network(RAN), a new visual question answering model, and compare it with baseline models such as stacked attention model and CNN-LSTM model. We find that our model performs better than these baseline models. In addition to our model, we also evaluate several holistic models and compare them with neural module networks frameworks, and the results show that neural modules networks perform better in questions reasoning. All the experiments have been done on the CLEVER dataset, which is a recent VQA dataset for evaluating multiple-step reasoning VQA models.