Multimodal fusion: advancing medical visual question-answering

Anjali Mudgal, Udbhav Kush, Aditya Kumar, Amir Jafari
{"title":"Multimodal fusion: advancing medical visual question-answering","authors":"Anjali Mudgal, Udbhav Kush, Aditya Kumar, Amir Jafari","doi":"10.1007/s00521-024-10318-8","DOIUrl":null,"url":null,"abstract":"<p>This paper explores the application of Visual Question-Answering (VQA) technology, which combines computer vision and natural language processing (NLP), in the medical domain, specifically for analyzing radiology scans. VQA can facilitate medical decision-making and improve patient outcomes by accurately interpreting medical imaging, which requires specialized expertise and time. The paper proposes developing an advanced VQA system for medical datasets using the Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation (BLIP) architecture from Salesforce, leveraging deep learning and transfer learning techniques to handle the unique challenges of medical/radiology images. The paper discusses the underlying concepts, methodologies, and results of applying the BLIP architecture and fine-tuning approaches for VQA in the medical domain, highlighting their effectiveness in addressing the complexities of VQA tasks for radiology scans. Inspired by the BLIP architecture from Salesforce, we propose a novel multi-modal fusion approach for medical VQA and evaluating its promising potential.</p>","PeriodicalId":18925,"journal":{"name":"Neural Computing and Applications","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Computing and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s00521-024-10318-8","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

This paper explores the application of Visual Question-Answering (VQA) technology, which combines computer vision and natural language processing (NLP), in the medical domain, specifically for analyzing radiology scans. VQA can facilitate medical decision-making and improve patient outcomes by accurately interpreting medical imaging, which requires specialized expertise and time. The paper proposes developing an advanced VQA system for medical datasets using the Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation (BLIP) architecture from Salesforce, leveraging deep learning and transfer learning techniques to handle the unique challenges of medical/radiology images. The paper discusses the underlying concepts, methodologies, and results of applying the BLIP architecture and fine-tuning approaches for VQA in the medical domain, highlighting their effectiveness in addressing the complexities of VQA tasks for radiology scans. Inspired by the BLIP architecture from Salesforce, we propose a novel multi-modal fusion approach for medical VQA and evaluating its promising potential.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
多模态融合:推进医学视觉问题解答
本文探讨了可视化问题解答(VQA)技术在医疗领域的应用,该技术结合了计算机视觉和自然语言处理(NLP),特别适用于分析放射扫描。VQA 可以准确解释医学影像,从而促进医疗决策并改善患者的治疗效果,而这需要专业知识和时间。本文建议使用 Salesforce 的 Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation (BLIP) 架构,利用深度学习和迁移学习技术为医疗数据集开发先进的 VQA 系统,以应对医疗/放射学图像的独特挑战。本文讨论了将 BLIP 架构和微调方法应用于医疗领域 VQA 的基本概念、方法和结果,强调了它们在解决放射学扫描 VQA 任务复杂性方面的有效性。在 Salesforce BLIP 架构的启发下,我们提出了一种用于医疗 VQA 的新型多模态融合方法,并对其潜力进行了评估。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Potential analysis of radiographic images to determine infestation of rice seeds Recommendation systems with user and item profiles based on symbolic modal data End-to-end entity extraction from OCRed texts using summarization models Firearm detection using DETR with multiple self-coordinated neural networks Automated defect identification in coherent diffraction imaging with smart continual learning
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1