{"title":"Self-supervised learning for chest computed tomography: training strategies and effect on downstream applications.","authors":"Amara Tariq, Gokul Ramasamy, Bhavik Patel, Imon Banerjee","doi":"10.1117/1.JMI.11.6.064003","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>Self-supervised pre-training can reduce the amount of labeled training data needed by pre-learning fundamental visual characteristics of the medical imaging data. We investigate several self-supervised training strategies for chest computed tomography exams and their effects on downstream applications.</p><p><strong>Approach: </strong>We benchmark five well-known self-supervision strategies (masked image region prediction, next slice prediction, rotation prediction, flip prediction, and denoising) on 15 M chest computed tomography (CT) slices collected from four sites of the Mayo Clinic enterprise, United States. These models were evaluated for two downstream tasks on public datasets: pulmonary embolism (PE) detection (classification) and lung nodule segmentation. Image embeddings generated by these models were also evaluated for prediction of patient age, race, and gender to study inherent biases in models' understanding of chest CT exams.</p><p><strong>Results: </strong>The use of pre-training weights especially masked region prediction-based weights, improved performance, and reduced computational effort needed for downstream tasks compared with task-specific state-of-the-art (SOTA) models. Performance improvement for PE detection was observed for training dataset sizes as large as <math><mrow><mo>∼</mo> <mn>380</mn> <mtext> </mtext> <mi>K</mi></mrow> </math> with a maximum gain of 5% over SOTA. The segmentation model initialized with pre-training weights learned twice as fast as the randomly initialized model. While gender and age predictors built using self-supervised training weights showed no performance improvement over randomly initialized predictors, the race predictor experienced a 10% performance boost when using self-supervised training weights.</p><p><strong>Conclusion: </strong>We released self-supervised models and weights under an open-source academic license. These models can then be fine-tuned with limited task-specific annotated data for a variety of downstream imaging tasks, thus accelerating research in biomedical imaging informatics.</p>","PeriodicalId":47707,"journal":{"name":"Journal of Medical Imaging","volume":"11 6","pages":"064003"},"PeriodicalIF":1.9000,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11550486/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Medical Imaging","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1117/1.JMI.11.6.064003","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/11/9 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
引用次数: 0
Abstract
Purpose: Self-supervised pre-training can reduce the amount of labeled training data needed by pre-learning fundamental visual characteristics of the medical imaging data. We investigate several self-supervised training strategies for chest computed tomography exams and their effects on downstream applications.
Approach: We benchmark five well-known self-supervision strategies (masked image region prediction, next slice prediction, rotation prediction, flip prediction, and denoising) on 15 M chest computed tomography (CT) slices collected from four sites of the Mayo Clinic enterprise, United States. These models were evaluated for two downstream tasks on public datasets: pulmonary embolism (PE) detection (classification) and lung nodule segmentation. Image embeddings generated by these models were also evaluated for prediction of patient age, race, and gender to study inherent biases in models' understanding of chest CT exams.
Results: The use of pre-training weights especially masked region prediction-based weights, improved performance, and reduced computational effort needed for downstream tasks compared with task-specific state-of-the-art (SOTA) models. Performance improvement for PE detection was observed for training dataset sizes as large as with a maximum gain of 5% over SOTA. The segmentation model initialized with pre-training weights learned twice as fast as the randomly initialized model. While gender and age predictors built using self-supervised training weights showed no performance improvement over randomly initialized predictors, the race predictor experienced a 10% performance boost when using self-supervised training weights.
Conclusion: We released self-supervised models and weights under an open-source academic license. These models can then be fine-tuned with limited task-specific annotated data for a variety of downstream imaging tasks, thus accelerating research in biomedical imaging informatics.
期刊介绍:
JMI covers fundamental and translational research, as well as applications, focused on medical imaging, which continue to yield physical and biomedical advancements in the early detection, diagnostics, and therapy of disease as well as in the understanding of normal. The scope of JMI includes: Imaging physics, Tomographic reconstruction algorithms (such as those in CT and MRI), Image processing and deep learning, Computer-aided diagnosis and quantitative image analysis, Visualization and modeling, Picture archiving and communications systems (PACS), Image perception and observer performance, Technology assessment, Ultrasonic imaging, Image-guided procedures, Digital pathology, Biomedical applications of biomedical imaging. JMI allows for the peer-reviewed communication and archiving of scientific developments, translational and clinical applications, reviews, and recommendations for the field.