Integrating Neural-Symbolic Reasoning With Variational Causal Inference Network for Explanatory Visual Question Answering.

Dizhan Xue, Shengsheng Qian, Changsheng Xu
{"title":"Integrating Neural-Symbolic Reasoning With Variational Causal Inference Network for Explanatory Visual Question Answering.","authors":"Dizhan Xue, Shengsheng Qian, Changsheng Xu","doi":"10.1109/TPAMI.2024.3398012","DOIUrl":null,"url":null,"abstract":"<p><p>Recently, a novel multimodal reasoning task named Explanatory Visual Question Answering (EVQA) has been introduced, which combines answering visual questions with multimodal explanation generation to expound upon the underlying reasoning processes. In contrast to conventional Visual Question Answering (VQA) that merely concentrates on providing answers, EVQA aims to improve the explainability and verifiability of reasoning by providing user-friendly explanations. Despite the improved explainability of inferred results, the existing EVQA models still adopt black-box neural networks to infer results, lacking the explainability of the reasoning process. Moreover, existing EVQA models commonly predict answers and explanations in isolation, overlooking the inherent causal correlation between them. To handle these challenges, we propose a Program-guided Variational Causal Inference Network (Pro-VCIN) that integrates neural-symbolic reasoning with variational causal inference and constructs causal correlations between the predicted answers and explanations. First, we utilize pretrained models to extract visual features and convert questions into the corresponding programs. Second, we propose a multimodal program Transformer to translate programs and the related visual features into coherent and rational explanations of the reasoning processes. Finally, we propose a variational causal inference to construct the target structural causal model and predict answers based on the causal correlation to explanations. Comprehensive experiments conducted on EVQA benchmark datasets reveal the superiority of Pro-VCIN in terms of both performance and explainability over state-of-the-art EVQA methods.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on pattern analysis and machine intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TPAMI.2024.3398012","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/11/6 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Recently, a novel multimodal reasoning task named Explanatory Visual Question Answering (EVQA) has been introduced, which combines answering visual questions with multimodal explanation generation to expound upon the underlying reasoning processes. In contrast to conventional Visual Question Answering (VQA) that merely concentrates on providing answers, EVQA aims to improve the explainability and verifiability of reasoning by providing user-friendly explanations. Despite the improved explainability of inferred results, the existing EVQA models still adopt black-box neural networks to infer results, lacking the explainability of the reasoning process. Moreover, existing EVQA models commonly predict answers and explanations in isolation, overlooking the inherent causal correlation between them. To handle these challenges, we propose a Program-guided Variational Causal Inference Network (Pro-VCIN) that integrates neural-symbolic reasoning with variational causal inference and constructs causal correlations between the predicted answers and explanations. First, we utilize pretrained models to extract visual features and convert questions into the corresponding programs. Second, we propose a multimodal program Transformer to translate programs and the related visual features into coherent and rational explanations of the reasoning processes. Finally, we propose a variational causal inference to construct the target structural causal model and predict answers based on the causal correlation to explanations. Comprehensive experiments conducted on EVQA benchmark datasets reveal the superiority of Pro-VCIN in terms of both performance and explainability over state-of-the-art EVQA methods.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
将神经符号推理与变异因果推理网络整合用于解释性视觉问题解答
最近,一种名为 "解释性视觉问题解答(EVQA)"的新型多模态推理任务问世,该任务将回答视觉问题与生成多模态解释相结合,以阐述基本的推理过程。传统的视觉问题解答(VQA)只专注于提供答案,而 EVQA 则旨在通过提供用户友好的解释来提高推理的可解释性和可验证性。尽管推理结果的可解释性有所提高,但现有的 EVQA 模型仍采用黑盒神经网络来推理结果,缺乏推理过程的可解释性。此外,现有的 EVQA 模型通常孤立地预测答案和解释,忽略了它们之间固有的因果相关性。为了应对这些挑战,我们提出了一种程序引导的变式因果推理网络(Pro-VCIN),它将神经符号推理与变式因果推理整合在一起,并构建了预测答案与解释之间的因果相关性。首先,我们利用预训练模型提取视觉特征,并将问题转换为相应的程序。最后,我们提出了变式因果推理方法,以构建目标结构因果模型,并根据与解释的因果相关性预测答案。在 EVQA 基准数据集上进行的综合实验表明,Pro-VCIN 在性能和可解释性方面都优于最先进的 EVQA 方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
EBMGC-GNF: Efficient Balanced Multi-View Graph Clustering via Good Neighbor Fusion. Integrating Neural-Symbolic Reasoning With Variational Causal Inference Network for Explanatory Visual Question Answering. Motion-Aware Dynamic Graph Neural Network for Video Compressive Sensing. Evaluation Metrics for Intelligent Generation of Graphical Game Assets: A Systematic Survey-Based Framework. Artificial Intelligence and Machine Learning Tools for Improving Early Warning Systems of Volcanic Eruptions: The Case of Stromboli.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1