Embedded prompt tuning: Towards enhanced calibration of pretrained models for medical images

IF 10.7 1区 医学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Medical image analysis Pub Date : 2024-07-04 DOI:10.1016/j.media.2024.103258
Wenqiang Zu , Shenghao Xie , Qing Zhao , Guoqi Li , Lei Ma
{"title":"Embedded prompt tuning: Towards enhanced calibration of pretrained models for medical images","authors":"Wenqiang Zu ,&nbsp;Shenghao Xie ,&nbsp;Qing Zhao ,&nbsp;Guoqi Li ,&nbsp;Lei Ma","doi":"10.1016/j.media.2024.103258","DOIUrl":null,"url":null,"abstract":"<div><p>Foundation models pre-trained on large-scale data have been widely witnessed to achieve success in various natural imaging downstream tasks. <strong>Parameter-efficient fine-tuning (PEFT)</strong> methods aim to adapt foundation models to new domains by updating only a small portion of parameters in order to reduce computational overhead. However, the effectiveness of these PEFT methods, especially in cross-domain few-shot scenarios, e.g., medical image analysis, has not been fully explored. In this work, we facilitate the study of the performance of PEFT when adapting foundation models to medical image classification tasks. Furthermore, to alleviate the limitations of prompt introducing ways and approximation capabilities on Transformer architectures of mainstream prompt tuning methods, we propose the <strong>Embedded Prompt Tuning (EPT)</strong> method by embedding prompt tokens into the expanded channels. We also find that there are anomalies in the feature space distribution of foundation models during pre-training process, and prompt tuning can help mitigate this negative impact. To explain this phenomenon, we also introduce a novel perspective to understand prompt tuning: <strong>Prompt tuning is a distribution calibrator.</strong> And we support it by analysing patch-wise scaling and feature separation operations contained in EPT. Our experiments show that EPT outperforms several state-of-the-art fine-tuning methods by a significant margin on few-shot medical image classification tasks, and completes the fine-tuning process within highly competitive time, indicating EPT is an effective PEFT method. The source code is available at <span>github.com/zuwenqiang/EPT</span><svg><path></path></svg>.</p></div>","PeriodicalId":18328,"journal":{"name":"Medical image analysis","volume":null,"pages":null},"PeriodicalIF":10.7000,"publicationDate":"2024-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Medical image analysis","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S136184152400183X","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Foundation models pre-trained on large-scale data have been widely witnessed to achieve success in various natural imaging downstream tasks. Parameter-efficient fine-tuning (PEFT) methods aim to adapt foundation models to new domains by updating only a small portion of parameters in order to reduce computational overhead. However, the effectiveness of these PEFT methods, especially in cross-domain few-shot scenarios, e.g., medical image analysis, has not been fully explored. In this work, we facilitate the study of the performance of PEFT when adapting foundation models to medical image classification tasks. Furthermore, to alleviate the limitations of prompt introducing ways and approximation capabilities on Transformer architectures of mainstream prompt tuning methods, we propose the Embedded Prompt Tuning (EPT) method by embedding prompt tokens into the expanded channels. We also find that there are anomalies in the feature space distribution of foundation models during pre-training process, and prompt tuning can help mitigate this negative impact. To explain this phenomenon, we also introduce a novel perspective to understand prompt tuning: Prompt tuning is a distribution calibrator. And we support it by analysing patch-wise scaling and feature separation operations contained in EPT. Our experiments show that EPT outperforms several state-of-the-art fine-tuning methods by a significant margin on few-shot medical image classification tasks, and completes the fine-tuning process within highly competitive time, indicating EPT is an effective PEFT method. The source code is available at github.com/zuwenqiang/EPT.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
嵌入式提示调整:增强医学图像预训练模型的校准。
在大规模数据上预先训练的基础模型在各种自然成像下游任务中取得了成功,这一点已被广泛证实。参数高效微调(PEFT)方法旨在通过只更新一小部分参数,使基础模型适应新的领域,以减少计算开销。然而,这些 PEFT 方法的有效性,尤其是在跨领域少镜头场景(如医学图像分析)中的有效性,尚未得到充分探索。在这项工作中,我们促进了对 PEFT 在将基础模型适应于医学图像分类任务时的性能的研究。此外,为了缓解主流提示调整方法的提示引入方式和近似能力对 Transformer 架构的限制,我们提出了嵌入式提示调整(Embedded Prompt Tuning,EPT)方法,将提示标记嵌入到扩展通道中。我们还发现,在预训练过程中,基础模型的特征空间分布会出现异常,而提示调谐可以帮助减轻这种负面影响。为了解释这一现象,我们还引入了一个新的视角来理解提示调谐:及时调整是一种分布校准器。我们通过分析 EPT 中包含的片段缩放和特征分离操作来支持它。我们的实验表明,EPT 在少镜头医学图像分类任务中的表现明显优于几种最先进的微调方法,并且能在极具竞争力的时间内完成微调过程,这表明 EPT 是一种有效的 PEFT 方法。源代码见 github.com/zuwenqiang/EPT。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Medical image analysis
Medical image analysis 工程技术-工程:生物医学
CiteScore
22.10
自引率
6.40%
发文量
309
审稿时长
6.6 months
期刊介绍: Medical Image Analysis serves as a platform for sharing new research findings in the realm of medical and biological image analysis, with a focus on applications of computer vision, virtual reality, and robotics to biomedical imaging challenges. The journal prioritizes the publication of high-quality, original papers contributing to the fundamental science of processing, analyzing, and utilizing medical and biological images. It welcomes approaches utilizing biomedical image datasets across all spatial scales, from molecular/cellular imaging to tissue/organ imaging.
期刊最新文献
Beyond strong labels: Weakly-supervised learning based on Gaussian pseudo labels for the segmentation of ellipse-like vascular structures in non-contrast CTs A cross-attention-based deep learning approach for predicting functional stroke outcomes using 4D CTP imaging and clinical metadata DACG: Dual Attention and Context Guidance model for radiology report generation Simulation-free prediction of atrial fibrillation inducibility with the fibrotic kernel signature An objective comparison of methods for augmented reality in laparoscopic liver resection by preoperative-to-intraoperative image fusion from the MICCAI2022 challenge
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1