Zero-Shot Medical Phrase Grounding with Off-the-shelf Diffusion Models.

IF 6.7 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS IEEE Journal of Biomedical and Health Informatics Pub Date : 2024-11-08 DOI:10.1109/JBHI.2024.3494246
Konstantinos Vilouras, Pedro Sanchez, Alison Q O'Neil, Sotirios A Tsaftaris
{"title":"Zero-Shot Medical Phrase Grounding with Off-the-shelf Diffusion Models.","authors":"Konstantinos Vilouras, Pedro Sanchez, Alison Q O'Neil, Sotirios A Tsaftaris","doi":"10.1109/JBHI.2024.3494246","DOIUrl":null,"url":null,"abstract":"<p><p>Localizing the exact pathological regions in a given medical scan is an important imaging problem that traditionally requires a large amount of bounding box ground truth annotations to be accurately solved. However, there exist alternative, potentially weaker, forms of supervision, such as accompanying free-text reports, which are readily available. The task of performing localization with textual guidance is commonly referred to as phrase grounding. In this work, we use a publicly available Foundation Model, namely the Latent Diffusion Model, to perform this challenging task. This choice is supported by the fact that the Latent Diffusion Model, despite being generative in nature, contains cross-attention mechanisms that implicitly align visual and textual features, thus leading to intermediate representations that are suitable for the task at hand. In addition, we aim to perform this task in a zero-shot manner, i.e., without any training on the target task, meaning that the model's weights remain frozen. To this end, we devise strategies to select features and also refine them via post-processing without extra learnable parameters. We compare our proposed method with state-of-the-art approaches which explicitly enforce image-text alignment in a joint embedding space via contrastive learning. Results on a popular chest X-ray benchmark indicate that our method is competitive with SOTA on different types of pathology, and even outperforms them on average in terms of two metrics (mean IoU and AUC-ROC). Source code will be released upon acceptance at https://github.com/vios-s.</p>","PeriodicalId":13073,"journal":{"name":"IEEE Journal of Biomedical and Health Informatics","volume":"PP ","pages":""},"PeriodicalIF":6.7000,"publicationDate":"2024-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Journal of Biomedical and Health Informatics","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1109/JBHI.2024.3494246","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Localizing the exact pathological regions in a given medical scan is an important imaging problem that traditionally requires a large amount of bounding box ground truth annotations to be accurately solved. However, there exist alternative, potentially weaker, forms of supervision, such as accompanying free-text reports, which are readily available. The task of performing localization with textual guidance is commonly referred to as phrase grounding. In this work, we use a publicly available Foundation Model, namely the Latent Diffusion Model, to perform this challenging task. This choice is supported by the fact that the Latent Diffusion Model, despite being generative in nature, contains cross-attention mechanisms that implicitly align visual and textual features, thus leading to intermediate representations that are suitable for the task at hand. In addition, we aim to perform this task in a zero-shot manner, i.e., without any training on the target task, meaning that the model's weights remain frozen. To this end, we devise strategies to select features and also refine them via post-processing without extra learnable parameters. We compare our proposed method with state-of-the-art approaches which explicitly enforce image-text alignment in a joint embedding space via contrastive learning. Results on a popular chest X-ray benchmark indicate that our method is competitive with SOTA on different types of pathology, and even outperforms them on average in terms of two metrics (mean IoU and AUC-ROC). Source code will be released upon acceptance at https://github.com/vios-s.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用现成的扩散模型实现医疗词组的零点接地。
在给定的医学扫描中定位准确的病理区域是一个重要的成像问题,传统上需要大量的边界框地面实况注释才能准确解决。不过,也有其他可能较弱的监督形式,如随附的自由文本报告,这些都是现成的。利用文本指导进行定位的任务通常被称为短语接地。在这项工作中,我们使用一个公开的基础模型,即潜在扩散模型,来完成这项具有挑战性的任务。潜在扩散模型尽管在本质上是生成模型,但它包含了交叉注意机制,可以隐式地调整视觉和文本特征,从而产生适合当前任务的中间表征,这一事实支持了我们的选择。此外,我们的目标是以 "0-shot "的方式完成这项任务,即不对目标任务进行任何训练,这意味着模型的权重保持冻结。为此,我们设计了一些策略来选择特征,并通过后处理来完善这些特征,而无需额外的可学习参数。我们将所提出的方法与通过对比学习在联合嵌入空间中明确执行图像-文本对齐的先进方法进行了比较。在一个流行的胸部 X 光基准上得出的结果表明,我们的方法在不同类型的病理上与 SOTA 具有竞争力,甚至在两个指标(平均 IoU 和 AUC-ROC)上平均优于它们。源代码将在 https://github.com/vios-s 上公布。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
IEEE Journal of Biomedical and Health Informatics
IEEE Journal of Biomedical and Health Informatics COMPUTER SCIENCE, INFORMATION SYSTEMS-COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS
CiteScore
13.60
自引率
6.50%
发文量
1151
期刊介绍: IEEE Journal of Biomedical and Health Informatics publishes original papers presenting recent advances where information and communication technologies intersect with health, healthcare, life sciences, and biomedicine. Topics include acquisition, transmission, storage, retrieval, management, and analysis of biomedical and health information. The journal covers applications of information technologies in healthcare, patient monitoring, preventive care, early disease diagnosis, therapy discovery, and personalized treatment protocols. It explores electronic medical and health records, clinical information systems, decision support systems, medical and biological imaging informatics, wearable systems, body area/sensor networks, and more. Integration-related topics like interoperability, evidence-based medicine, and secure patient data are also addressed.
期刊最新文献
Machine Learning Identification and Classification of Mitosis and Migration of Cancer Cells in a Lab-on-CMOS Capacitance Sensing platform. Biomedical Information Integration via Adaptive Large Language Model Construction. BloodPatrol: Revolutionizing Blood Cancer Diagnosis - Advanced Real-Time Detection Leveraging Deep Learning & Cloud Technologies. EEG Detection and Prediction of Freezing of Gait in Parkinson's Disease Based on Spatiotemporal Coherent Modes. Functional Data Analysis of Hand Rotation for Open Surgical Suturing Skill Assessment.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1