{"title":"Maintaining Reasoning Consistency in Compositional Visual Question Answering","authors":"Chenchen Jing, Yunde Jia, Yuwei Wu, Xinyu Liu, Qi Wu","doi":"10.1109/CVPR52688.2022.00504","DOIUrl":null,"url":null,"abstract":"A compositional question refers to a question that contains multiple visual concepts (e.g., objects, attributes, and relationships) and requires compositional reasoning to answer. Existing VQA models can answer a compositional question well, but cannot work well in terms of reasoning consistency in answering the compositional question and its sub-questions. For example, a compositional question for an image is: “Are there any elephants to the right of the white bird?” and one of its sub-questions is “Is any bird visible in the scene?”. The models may answer “yes” to the compositional question, but “no” to the sub-question. This paper presents a dialog-like reasoning method for maintaining reasoning consistency in answering a compositional question and its sub-questions. Our method integrates the reasoning processes for the sub-questions into the reasoning process for the compositional question like a dialog task, and uses a consistency constraint to penalize inconsistent answer predictions. In order to enable quantitative evaluation of reasoning consistency, we construct a GQA-Sub dataset based on the well-organized GQA dataset. Experimental results on the GQA dataset and the GQA-Sub dataset demonstrate the effectiveness of our method.","PeriodicalId":355552,"journal":{"name":"2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CVPR52688.2022.00504","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6
Abstract
A compositional question refers to a question that contains multiple visual concepts (e.g., objects, attributes, and relationships) and requires compositional reasoning to answer. Existing VQA models can answer a compositional question well, but cannot work well in terms of reasoning consistency in answering the compositional question and its sub-questions. For example, a compositional question for an image is: “Are there any elephants to the right of the white bird?” and one of its sub-questions is “Is any bird visible in the scene?”. The models may answer “yes” to the compositional question, but “no” to the sub-question. This paper presents a dialog-like reasoning method for maintaining reasoning consistency in answering a compositional question and its sub-questions. Our method integrates the reasoning processes for the sub-questions into the reasoning process for the compositional question like a dialog task, and uses a consistency constraint to penalize inconsistent answer predictions. In order to enable quantitative evaluation of reasoning consistency, we construct a GQA-Sub dataset based on the well-organized GQA dataset. Experimental results on the GQA dataset and the GQA-Sub dataset demonstrate the effectiveness of our method.