Cross-Modal Knowledge Diffusion-Based Generation for Difference-Aware Medical VQA

Qika Lin;Kai He;Yifan Zhu;Fangzhi Xu;Erik Cambria;Mengling Feng
{"title":"Cross-Modal Knowledge Diffusion-Based Generation for Difference-Aware Medical VQA","authors":"Qika Lin;Kai He;Yifan Zhu;Fangzhi Xu;Erik Cambria;Mengling Feng","doi":"10.1109/TIP.2025.3558446","DOIUrl":null,"url":null,"abstract":"Multimodal medical applications have garnered considerable attention due to their potential to offer comprehensive and robust support for medical assistance. Specifically, within this domain, difference-aware medical Visual Question Answering (VQA) has emerged as a topic of increasing interest that enables the recognition of changes in physical conditions over time when compared to previous states and provides customized suggestions accordingly. However, it is challenging because samples usually exhibit characteristics of complexity, diversity, and inherent noise. Besides, there is a need for multimodal knowledge understanding of the medical domain. The difference-aware setting requiring image comparison further intensifies these situations. To this end, we propose a cross-Modal knowlEdge diffusioN-baseD gEneration netwoRk (MENDER), where the diffusion mechanism with multi-step denoising and knowledge injection from global to local level are employed to tackle the aforementioned challenges, respectively. The diffusion process is to gradually generate answers with the sequence input of questions, random noises for the answer masks and virtual vision prompts of images. The strategy of answer nosing and knowledge cascading is specifically tailored for this task and is implemented during forward and reverse diffusion processes. Moreover, the visual and structure knowledge injection are proposed to learn virtual vision prompts to guide the diffusion process, where the former is realized using a pre-trained medical image-text network and the latter is modeled with spatial and semantic graph structures processed by the heterogeneous graph Transformer models. Experiment results demonstrate the effectiveness of MENDER for difference-aware medical VQA. Furthermore, it also exhibits notable performance in the low-resource setting and conventional medical VQA tasks.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"2421-2434"},"PeriodicalIF":13.7000,"publicationDate":"2025-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10964089/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Multimodal medical applications have garnered considerable attention due to their potential to offer comprehensive and robust support for medical assistance. Specifically, within this domain, difference-aware medical Visual Question Answering (VQA) has emerged as a topic of increasing interest that enables the recognition of changes in physical conditions over time when compared to previous states and provides customized suggestions accordingly. However, it is challenging because samples usually exhibit characteristics of complexity, diversity, and inherent noise. Besides, there is a need for multimodal knowledge understanding of the medical domain. The difference-aware setting requiring image comparison further intensifies these situations. To this end, we propose a cross-Modal knowlEdge diffusioN-baseD gEneration netwoRk (MENDER), where the diffusion mechanism with multi-step denoising and knowledge injection from global to local level are employed to tackle the aforementioned challenges, respectively. The diffusion process is to gradually generate answers with the sequence input of questions, random noises for the answer masks and virtual vision prompts of images. The strategy of answer nosing and knowledge cascading is specifically tailored for this task and is implemented during forward and reverse diffusion processes. Moreover, the visual and structure knowledge injection are proposed to learn virtual vision prompts to guide the diffusion process, where the former is realized using a pre-trained medical image-text network and the latter is modeled with spatial and semantic graph structures processed by the heterogeneous graph Transformer models. Experiment results demonstrate the effectiveness of MENDER for difference-aware medical VQA. Furthermore, it also exhibits notable performance in the low-resource setting and conventional medical VQA tasks.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于跨模态知识扩散的差分感知医学VQA生成
多模式医疗应用由于有可能为医疗援助提供全面和有力的支持而引起了相当大的关注。具体来说,在这个领域中,差异感知医学视觉问答(VQA)已经成为一个越来越受关注的话题,它能够识别与以前状态相比,随着时间的推移身体状况的变化,并提供相应的定制建议。然而,这是具有挑战性的,因为样本通常表现出复杂性、多样性和固有噪声的特征。此外,还需要对医学领域的多模态知识进行理解。需要图像比较的差异感知设置进一步加剧了这些情况。为此,我们提出了一种基于跨模态知识扩散的生成网络(MENDER),该网络采用多步去噪扩散机制和从全局到局部的知识注入机制来解决上述挑战。扩散过程是通过问题的顺序输入、答案掩模的随机噪声和图像的虚拟视觉提示,逐步生成答案。答案嗅探和知识级联策略是专门为该任务量身定制的,并在正向和反向扩散过程中实现。此外,提出了视觉和结构知识注入来学习虚拟视觉提示来指导扩散过程,前者使用预训练的医学图像-文本网络实现,后者使用异构图Transformer模型处理的空间和语义图结构建模。实验结果证明了MENDER在差分感知医学VQA中的有效性。此外,它在低资源环境和传统医疗VQA任务中也表现出显著的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Reflectance Prediction-based Knowledge Distillation for Robust 3D Object Detection in Compressed Point Clouds. Implicit Neural Compression of Point Clouds. Token Calibration for Transformer-based Domain Adaptation. Task-Driven Underwater Image Enhancement via Hierarchical Semantic Refinement. Coupled Diffusion Posterior Sampling for Unsupervised Hyperspectral and Multispectral Images Fusion.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1