Why does my medical AI look at pictures of birds? Exploring the efficacy of transfer learning across domain boundaries

IF 4.8 2区 医学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Computer methods and programs in biomedicine Pub Date : 2025-04-01 Epub Date: 2025-01-31 DOI:10.1016/j.cmpb.2025.108634
Frederic Jonske , Moon Kim , Enrico Nasca , Janis Evers , Johannes Haubold , René Hosch , Felix Nensa , Michael Kamp , Constantin Seibold , Jan Egger , Jens Kleesiek
{"title":"Why does my medical AI look at pictures of birds? Exploring the efficacy of transfer learning across domain boundaries","authors":"Frederic Jonske ,&nbsp;Moon Kim ,&nbsp;Enrico Nasca ,&nbsp;Janis Evers ,&nbsp;Johannes Haubold ,&nbsp;René Hosch ,&nbsp;Felix Nensa ,&nbsp;Michael Kamp ,&nbsp;Constantin Seibold ,&nbsp;Jan Egger ,&nbsp;Jens Kleesiek","doi":"10.1016/j.cmpb.2025.108634","DOIUrl":null,"url":null,"abstract":"<div><h3>Purpose</h3><div>In medical deep learning, models not trained from scratch are typically fine-tuned based on ImageNet-pretrained models. We posit that pretraining on data from the domain of the downstream task should almost always be preferable.</div></div><div><h3>Materials and methods</h3><div>We leverage RadNet-12M and RadNet-1.28M, datasets containing &gt;12 million/1.28 million acquired CT image slices from 90,663 individual scans, and explore the efficacy of self-supervised, contrastive pretraining on the medical and natural image domains. We compare the respective performance gains for five downstream tasks. For each experiment, we report accuracy, AUC, or DICE score and uncertainty estimations based on four separate runs. We quantify significance using Welch's <em>t</em>-test. Finally, we perform feature space analysis to characterize the nature of the observed performance gains.</div></div><div><h3>Results</h3><div>We observe that intra-domain transfer (RadNet pretraining and CT-based tasks) compares favorably to cross-domain transfer (ImageNet pretraining and CT-based tasks), generally achieving comparable or improved performance – Δ = +0.44% (<em>p</em> = 0.541) when fine-tuned on RadNet-1.28M, Δ = +2.07% (<em>p</em> = 0.025) when linearly evaluating on</div><div>RadNet-1.28M, and Δ = +1.63% (<em>p</em> = 0.114) when fine-tuning on 1 % of RadNet-1.28M data. This intra-domain advantage extends to LiTS 2017, another CT-based dataset, but not to other medical imaging modalities. A corresponding intra-domain advantage was also observed for natural images. Outside the CT image domain, ImageNet-pretrained models generalized better than RadNet-pretrained models.</div><div>We further demonstrate that pretraining on medical images yields domain-specific features that are preserved during fine-tuning, and which correspond to macroscopic image properties and structures.</div></div><div><h3>Conclusion</h3><div>We conclude that intra-domain pretraining generally outperforms cross-domain pretraining, but that very narrow domain definitions apply. Put simply, pretraining on CT images instead of natural images yields an advantage when fine-tuning on CT images, and only on CT images. We further conclude that ImageNet pretraining remains a strong baseline, as well as the best choice for pretraining if only insufficient data from the target domain is available. Finally, we publish our pretrained models and pretraining guidelines as a baseline for future research.</div></div>","PeriodicalId":10624,"journal":{"name":"Computer methods and programs in biomedicine","volume":"261 ","pages":"Article 108634"},"PeriodicalIF":4.8000,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer methods and programs in biomedicine","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0169260725000513","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/31 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

Abstract

Purpose

In medical deep learning, models not trained from scratch are typically fine-tuned based on ImageNet-pretrained models. We posit that pretraining on data from the domain of the downstream task should almost always be preferable.

Materials and methods

We leverage RadNet-12M and RadNet-1.28M, datasets containing >12 million/1.28 million acquired CT image slices from 90,663 individual scans, and explore the efficacy of self-supervised, contrastive pretraining on the medical and natural image domains. We compare the respective performance gains for five downstream tasks. For each experiment, we report accuracy, AUC, or DICE score and uncertainty estimations based on four separate runs. We quantify significance using Welch's t-test. Finally, we perform feature space analysis to characterize the nature of the observed performance gains.

Results

We observe that intra-domain transfer (RadNet pretraining and CT-based tasks) compares favorably to cross-domain transfer (ImageNet pretraining and CT-based tasks), generally achieving comparable or improved performance – Δ = +0.44% (p = 0.541) when fine-tuned on RadNet-1.28M, Δ = +2.07% (p = 0.025) when linearly evaluating on
RadNet-1.28M, and Δ = +1.63% (p = 0.114) when fine-tuning on 1 % of RadNet-1.28M data. This intra-domain advantage extends to LiTS 2017, another CT-based dataset, but not to other medical imaging modalities. A corresponding intra-domain advantage was also observed for natural images. Outside the CT image domain, ImageNet-pretrained models generalized better than RadNet-pretrained models.
We further demonstrate that pretraining on medical images yields domain-specific features that are preserved during fine-tuning, and which correspond to macroscopic image properties and structures.

Conclusion

We conclude that intra-domain pretraining generally outperforms cross-domain pretraining, but that very narrow domain definitions apply. Put simply, pretraining on CT images instead of natural images yields an advantage when fine-tuning on CT images, and only on CT images. We further conclude that ImageNet pretraining remains a strong baseline, as well as the best choice for pretraining if only insufficient data from the target domain is available. Finally, we publish our pretrained models and pretraining guidelines as a baseline for future research.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
为什么我的医疗人工智能要看鸟的照片?探索跨领域迁移学习的有效性
在医学深度学习中,没有从头开始训练的模型通常是基于imagenet预训练的模型进行微调的。我们假设对来自下游任务域的数据进行预训练几乎总是可取的。材料和方法我们利用RadNet-12M和radnet - 128m数据集,从90,663个单独的扫描中获取了1200万/ 128万CT图像切片,并探索了自监督、对比预训练在医学和自然图像领域的效果。我们比较了五个下游任务各自的性能增益。对于每个实验,我们报告基于四次单独运行的准确性,AUC或DICE评分和不确定性估计。我们使用韦尔奇t检验来量化显著性。最后,我们执行特征空间分析来描述观察到的性能增益的性质。结果我们观察到域内传输(RadNet预训练和基于ct的任务)优于跨域传输(ImageNet预训练和基于ct的任务),通常实现相当或改进的性能-当在RadNet-1.28 m上微调时Δ = +0.44% (p = 0.541),当线性评估RadNet-1.28 m时Δ = +2.07% (p = 0.025),当在1%的RadNet-1.28 m数据上微调时Δ = +1.63% (p = 0.114)。这种域内优势扩展到LiTS 2017,另一个基于ct的数据集,但不适用于其他医学成像模式。对自然图像也观察到相应的域内优势。在CT图像域之外,imagenet预训练模型的泛化效果优于radnet预训练模型。我们进一步证明,医学图像的预训练产生了在微调期间保留的特定领域特征,这些特征对应于宏观图像属性和结构。我们得出结论,域内预训练通常优于跨域预训练,但非常狭窄的域定义适用。简单地说,当对CT图像进行微调时,对CT图像而不是自然图像进行预训练会产生优势,并且仅对CT图像进行微调。我们进一步得出结论,ImageNet预训练仍然是一个强大的基线,如果只有来自目标领域的数据不足,它也是预训练的最佳选择。最后,我们发布了我们的预训练模型和预训练指南,作为未来研究的基线。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Computer methods and programs in biomedicine
Computer methods and programs in biomedicine 工程技术-工程:生物医学
CiteScore
12.30
自引率
6.60%
发文量
601
审稿时长
135 days
期刊介绍: To encourage the development of formal computing methods, and their application in biomedical research and medical practice, by illustration of fundamental principles in biomedical informatics research; to stimulate basic research into application software design; to report the state of research of biomedical information processing projects; to report new computer methodologies applied in biomedical areas; the eventual distribution of demonstrable software to avoid duplication of effort; to provide a forum for discussion and improvement of existing software; to optimize contact between national organizations and regional user groups by promoting an international exchange of information on formal methods, standards and software in biomedicine. Computer Methods and Programs in Biomedicine covers computing methodology and software systems derived from computing science for implementation in all aspects of biomedical research and medical practice. It is designed to serve: biochemists; biologists; geneticists; immunologists; neuroscientists; pharmacologists; toxicologists; clinicians; epidemiologists; psychiatrists; psychologists; cardiologists; chemists; (radio)physicists; computer scientists; programmers and systems analysts; biomedical, clinical, electrical and other engineers; teachers of medical informatics and users of educational software.
期刊最新文献
A modular deep learning pipeline for stromal TILs scoring in breast cancer H&E slides Machine learning classification of normal and malignant cells on the basis of their viscoelastic properties Assessing apparent cell stiffness on fibrous substrates: A comparison of numerical-analytical and in silico models with a novel thermo-contraction approach Semi-automatic generation of selected cerebral vessels for the objective evaluation of vessel segmentation and their geometric parameters in computed tomography angiography images Skeleton-guided sparse anchors for rotated instance segmentation in cell microscopy
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1