Frederic Jonske , Moon Kim , Enrico Nasca , Janis Evers , Johannes Haubold , René Hosch , Felix Nensa , Michael Kamp , Constantin Seibold , Jan Egger , Jens Kleesiek
{"title":"Why does my medical AI look at pictures of birds? Exploring the efficacy of transfer learning across domain boundaries","authors":"Frederic Jonske , Moon Kim , Enrico Nasca , Janis Evers , Johannes Haubold , René Hosch , Felix Nensa , Michael Kamp , Constantin Seibold , Jan Egger , Jens Kleesiek","doi":"10.1016/j.cmpb.2025.108634","DOIUrl":null,"url":null,"abstract":"<div><h3>Purpose</h3><div>In medical deep learning, models not trained from scratch are typically fine-tuned based on ImageNet-pretrained models. We posit that pretraining on data from the domain of the downstream task should almost always be preferable.</div></div><div><h3>Materials and methods</h3><div>We leverage RadNet-12M and RadNet-1.28M, datasets containing >12 million/1.28 million acquired CT image slices from 90,663 individual scans, and explore the efficacy of self-supervised, contrastive pretraining on the medical and natural image domains. We compare the respective performance gains for five downstream tasks. For each experiment, we report accuracy, AUC, or DICE score and uncertainty estimations based on four separate runs. We quantify significance using Welch's <em>t</em>-test. Finally, we perform feature space analysis to characterize the nature of the observed performance gains.</div></div><div><h3>Results</h3><div>We observe that intra-domain transfer (RadNet pretraining and CT-based tasks) compares favorably to cross-domain transfer (ImageNet pretraining and CT-based tasks), generally achieving comparable or improved performance – Δ = +0.44% (<em>p</em> = 0.541) when fine-tuned on RadNet-1.28M, Δ = +2.07% (<em>p</em> = 0.025) when linearly evaluating on</div><div>RadNet-1.28M, and Δ = +1.63% (<em>p</em> = 0.114) when fine-tuning on 1 % of RadNet-1.28M data. This intra-domain advantage extends to LiTS 2017, another CT-based dataset, but not to other medical imaging modalities. A corresponding intra-domain advantage was also observed for natural images. Outside the CT image domain, ImageNet-pretrained models generalized better than RadNet-pretrained models.</div><div>We further demonstrate that pretraining on medical images yields domain-specific features that are preserved during fine-tuning, and which correspond to macroscopic image properties and structures.</div></div><div><h3>Conclusion</h3><div>We conclude that intra-domain pretraining generally outperforms cross-domain pretraining, but that very narrow domain definitions apply. Put simply, pretraining on CT images instead of natural images yields an advantage when fine-tuning on CT images, and only on CT images. We further conclude that ImageNet pretraining remains a strong baseline, as well as the best choice for pretraining if only insufficient data from the target domain is available. Finally, we publish our pretrained models and pretraining guidelines as a baseline for future research.</div></div>","PeriodicalId":10624,"journal":{"name":"Computer methods and programs in biomedicine","volume":"261 ","pages":"Article 108634"},"PeriodicalIF":4.9000,"publicationDate":"2025-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer methods and programs in biomedicine","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0169260725000513","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0
Abstract
Purpose
In medical deep learning, models not trained from scratch are typically fine-tuned based on ImageNet-pretrained models. We posit that pretraining on data from the domain of the downstream task should almost always be preferable.
Materials and methods
We leverage RadNet-12M and RadNet-1.28M, datasets containing >12 million/1.28 million acquired CT image slices from 90,663 individual scans, and explore the efficacy of self-supervised, contrastive pretraining on the medical and natural image domains. We compare the respective performance gains for five downstream tasks. For each experiment, we report accuracy, AUC, or DICE score and uncertainty estimations based on four separate runs. We quantify significance using Welch's t-test. Finally, we perform feature space analysis to characterize the nature of the observed performance gains.
Results
We observe that intra-domain transfer (RadNet pretraining and CT-based tasks) compares favorably to cross-domain transfer (ImageNet pretraining and CT-based tasks), generally achieving comparable or improved performance – Δ = +0.44% (p = 0.541) when fine-tuned on RadNet-1.28M, Δ = +2.07% (p = 0.025) when linearly evaluating on
RadNet-1.28M, and Δ = +1.63% (p = 0.114) when fine-tuning on 1 % of RadNet-1.28M data. This intra-domain advantage extends to LiTS 2017, another CT-based dataset, but not to other medical imaging modalities. A corresponding intra-domain advantage was also observed for natural images. Outside the CT image domain, ImageNet-pretrained models generalized better than RadNet-pretrained models.
We further demonstrate that pretraining on medical images yields domain-specific features that are preserved during fine-tuning, and which correspond to macroscopic image properties and structures.
Conclusion
We conclude that intra-domain pretraining generally outperforms cross-domain pretraining, but that very narrow domain definitions apply. Put simply, pretraining on CT images instead of natural images yields an advantage when fine-tuning on CT images, and only on CT images. We further conclude that ImageNet pretraining remains a strong baseline, as well as the best choice for pretraining if only insufficient data from the target domain is available. Finally, we publish our pretrained models and pretraining guidelines as a baseline for future research.
期刊介绍:
To encourage the development of formal computing methods, and their application in biomedical research and medical practice, by illustration of fundamental principles in biomedical informatics research; to stimulate basic research into application software design; to report the state of research of biomedical information processing projects; to report new computer methodologies applied in biomedical areas; the eventual distribution of demonstrable software to avoid duplication of effort; to provide a forum for discussion and improvement of existing software; to optimize contact between national organizations and regional user groups by promoting an international exchange of information on formal methods, standards and software in biomedicine.
Computer Methods and Programs in Biomedicine covers computing methodology and software systems derived from computing science for implementation in all aspects of biomedical research and medical practice. It is designed to serve: biochemists; biologists; geneticists; immunologists; neuroscientists; pharmacologists; toxicologists; clinicians; epidemiologists; psychiatrists; psychologists; cardiologists; chemists; (radio)physicists; computer scientists; programmers and systems analysts; biomedical, clinical, electrical and other engineers; teachers of medical informatics and users of educational software.