CrisisKAN: Knowledge-infused and Explainable Multimodal Attention Network for Crisis Event Classification

Shubham Gupta, Nandini Saini, Suman Kundu, Debasis Das
{"title":"CrisisKAN: Knowledge-infused and Explainable Multimodal Attention Network for Crisis Event Classification","authors":"Shubham Gupta, Nandini Saini, Suman Kundu, Debasis Das","doi":"10.48550/arXiv.2401.06194","DOIUrl":null,"url":null,"abstract":"Pervasive use of social media has become the emerging source for real-time information (like images, text, or both) to identify various events. Despite the rapid growth of image and text-based event classification, the state-of-the-art (SOTA) models find it challenging to bridge the semantic gap between features of image and text modalities due to inconsistent encoding. Also, the black-box nature of models fails to explain the model's outcomes for building trust in high-stakes situations such as disasters, pandemic. Additionally, the word limit imposed on social media posts can potentially introduce bias towards specific events. To address these issues, we proposed CrisisKAN, a novel Knowledge-infused and Explainable Multimodal Attention Network that entails images and texts in conjunction with external knowledge from Wikipedia to classify crisis events. To enrich the context-specific understanding of textual information, we integrated Wikipedia knowledge using proposed wiki extraction algorithm. Along with this, a guided cross-attention module is implemented to fill the semantic gap in integrating visual and textual data. In order to ensure reliability, we employ a model-specific approach called Gradient-weighted Class Activation Mapping (Grad-CAM) that provides a robust explanation of the predictions of the proposed model. The comprehensive experiments conducted on the CrisisMMD dataset yield in-depth analysis across various crisis-specific tasks and settings. As a result, CrisisKAN outperforms existing SOTA methodologies and provides a novel view in the domain of explainable multimodal event classification.","PeriodicalId":126309,"journal":{"name":"European Conference on Information Retrieval","volume":"6 2","pages":"18-33"},"PeriodicalIF":0.0000,"publicationDate":"2024-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Conference on Information Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2401.06194","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Pervasive use of social media has become the emerging source for real-time information (like images, text, or both) to identify various events. Despite the rapid growth of image and text-based event classification, the state-of-the-art (SOTA) models find it challenging to bridge the semantic gap between features of image and text modalities due to inconsistent encoding. Also, the black-box nature of models fails to explain the model's outcomes for building trust in high-stakes situations such as disasters, pandemic. Additionally, the word limit imposed on social media posts can potentially introduce bias towards specific events. To address these issues, we proposed CrisisKAN, a novel Knowledge-infused and Explainable Multimodal Attention Network that entails images and texts in conjunction with external knowledge from Wikipedia to classify crisis events. To enrich the context-specific understanding of textual information, we integrated Wikipedia knowledge using proposed wiki extraction algorithm. Along with this, a guided cross-attention module is implemented to fill the semantic gap in integrating visual and textual data. In order to ensure reliability, we employ a model-specific approach called Gradient-weighted Class Activation Mapping (Grad-CAM) that provides a robust explanation of the predictions of the proposed model. The comprehensive experiments conducted on the CrisisMMD dataset yield in-depth analysis across various crisis-specific tasks and settings. As a result, CrisisKAN outperforms existing SOTA methodologies and provides a novel view in the domain of explainable multimodal event classification.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
CrisisKAN:用于危机事件分类的注入知识且可解释的多模态注意力网络
社交媒体的广泛使用已成为识别各种事件的实时信息(如图像、文本或两者)的新兴来源。尽管基于图像和文本的事件分类迅速发展,但由于编码不一致,最先进的(SOTA)模型在弥合图像和文本模式特征之间的语义鸿沟方面面临挑战。此外,模型的黑箱性质也无法解释模型在灾难、大流行等高风险情况下建立信任的结果。此外,社交媒体帖子的字数限制可能会对特定事件产生偏见。为了解决这些问题,我们提出了 CrisisKAN,这是一种新颖的知识注入和可解释的多模态注意力网络,它将图像和文本与维基百科的外部知识相结合,对危机事件进行分类。为了丰富对文本信息的特定语境理解,我们利用提出的维基提取算法整合了维基百科知识。与此同时,我们还实施了一个引导式交叉关注模块,以填补整合视觉和文本数据时的语义空白。为了确保可靠性,我们采用了一种称为 "梯度加权类激活映射"(Gradient-weighted Class Activation Mapping,Grad-CAM)的特定模型方法,该方法可对所提模型的预测结果进行稳健的解释。在 CrisisMMD 数据集上进行的综合实验对各种危机特定任务和设置进行了深入分析。因此,CrisisKAN 优于现有的 SOTA 方法,并在可解释多模态事件分类领域提供了一种新的视角。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Experiments in News Bias Detection with Pre-trained Neural Transformers Is Interpretable Machine Learning Effective at Feature Selection for Neural Learning-to-Rank? Two-Step SPLADE: Simple, Efficient and Effective Approximation of SPLADE Exploring the Nexus Between Retrievability and Query Generation Strategies Countering Mainstream Bias via End-to-End Adaptive Local Learning
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1