{"title":"胸部计算机断层扫描的自我监督学习:训练策略及对下游应用的影响。","authors":"Amara Tariq, Gokul Ramasamy, Bhavik Patel, Imon Banerjee","doi":"10.1117/1.JMI.11.6.064003","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>Self-supervised pre-training can reduce the amount of labeled training data needed by pre-learning fundamental visual characteristics of the medical imaging data. We investigate several self-supervised training strategies for chest computed tomography exams and their effects on downstream applications.</p><p><strong>Approach: </strong>We benchmark five well-known self-supervision strategies (masked image region prediction, next slice prediction, rotation prediction, flip prediction, and denoising) on 15 M chest computed tomography (CT) slices collected from four sites of the Mayo Clinic enterprise, United States. These models were evaluated for two downstream tasks on public datasets: pulmonary embolism (PE) detection (classification) and lung nodule segmentation. Image embeddings generated by these models were also evaluated for prediction of patient age, race, and gender to study inherent biases in models' understanding of chest CT exams.</p><p><strong>Results: </strong>The use of pre-training weights especially masked region prediction-based weights, improved performance, and reduced computational effort needed for downstream tasks compared with task-specific state-of-the-art (SOTA) models. Performance improvement for PE detection was observed for training dataset sizes as large as <math><mrow><mo>∼</mo> <mn>380</mn> <mtext> </mtext> <mi>K</mi></mrow> </math> with a maximum gain of 5% over SOTA. The segmentation model initialized with pre-training weights learned twice as fast as the randomly initialized model. While gender and age predictors built using self-supervised training weights showed no performance improvement over randomly initialized predictors, the race predictor experienced a 10% performance boost when using self-supervised training weights.</p><p><strong>Conclusion: </strong>We released self-supervised models and weights under an open-source academic license. These models can then be fine-tuned with limited task-specific annotated data for a variety of downstream imaging tasks, thus accelerating research in biomedical imaging informatics.</p>","PeriodicalId":47707,"journal":{"name":"Journal of Medical Imaging","volume":"11 6","pages":"064003"},"PeriodicalIF":1.9000,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11550486/pdf/","citationCount":"0","resultStr":"{\"title\":\"Self-supervised learning for chest computed tomography: training strategies and effect on downstream applications.\",\"authors\":\"Amara Tariq, Gokul Ramasamy, Bhavik Patel, Imon Banerjee\",\"doi\":\"10.1117/1.JMI.11.6.064003\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Purpose: </strong>Self-supervised pre-training can reduce the amount of labeled training data needed by pre-learning fundamental visual characteristics of the medical imaging data. We investigate several self-supervised training strategies for chest computed tomography exams and their effects on downstream applications.</p><p><strong>Approach: </strong>We benchmark five well-known self-supervision strategies (masked image region prediction, next slice prediction, rotation prediction, flip prediction, and denoising) on 15 M chest computed tomography (CT) slices collected from four sites of the Mayo Clinic enterprise, United States. These models were evaluated for two downstream tasks on public datasets: pulmonary embolism (PE) detection (classification) and lung nodule segmentation. Image embeddings generated by these models were also evaluated for prediction of patient age, race, and gender to study inherent biases in models' understanding of chest CT exams.</p><p><strong>Results: </strong>The use of pre-training weights especially masked region prediction-based weights, improved performance, and reduced computational effort needed for downstream tasks compared with task-specific state-of-the-art (SOTA) models. Performance improvement for PE detection was observed for training dataset sizes as large as <math><mrow><mo>∼</mo> <mn>380</mn> <mtext> </mtext> <mi>K</mi></mrow> </math> with a maximum gain of 5% over SOTA. The segmentation model initialized with pre-training weights learned twice as fast as the randomly initialized model. While gender and age predictors built using self-supervised training weights showed no performance improvement over randomly initialized predictors, the race predictor experienced a 10% performance boost when using self-supervised training weights.</p><p><strong>Conclusion: </strong>We released self-supervised models and weights under an open-source academic license. These models can then be fine-tuned with limited task-specific annotated data for a variety of downstream imaging tasks, thus accelerating research in biomedical imaging informatics.</p>\",\"PeriodicalId\":47707,\"journal\":{\"name\":\"Journal of Medical Imaging\",\"volume\":\"11 6\",\"pages\":\"064003\"},\"PeriodicalIF\":1.9000,\"publicationDate\":\"2024-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11550486/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Medical Imaging\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1117/1.JMI.11.6.064003\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/11/9 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q3\",\"JCRName\":\"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Medical Imaging","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1117/1.JMI.11.6.064003","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/11/9 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
引用次数: 0
摘要
目的:自监督预训练可以通过预学习医学影像数据的基本视觉特征来减少所需的标记训练数据量。我们研究了胸部计算机断层扫描检查的几种自监督训练策略及其对下游应用的影响:我们在从美国梅奥诊所企业的四个站点收集的 1500 万张胸部计算机断层扫描(CT)切片上,对五种著名的自监督策略(遮蔽图像区域预测、下一切片预测、旋转预测、翻转预测和去噪)进行了基准测试。这些模型针对公共数据集上的两项下游任务进行了评估:肺栓塞(PE)检测(分类)和肺结节分割。此外,还对这些模型生成的图像嵌入进行了评估,以预测患者的年龄、种族和性别,从而研究模型对胸部 CT 检查的理解是否存在固有偏差:结果:与针对特定任务的最先进模型(SOTA)相比,使用预训练权重(尤其是基于掩蔽区域预测的权重)提高了性能,并减少了下游任务所需的计算工作量。当训练数据集的大小达到 380 K 时,PE 检测的性能有所提高,与 SOTA 相比最大提高了 5%。使用预训练权重初始化的分割模型的学习速度是随机初始化模型的两倍。与随机初始化的预测器相比,使用自我监督训练权重构建的性别和年龄预测器的性能没有提高,但使用自我监督训练权重的种族预测器的性能提高了 10%:我们以开源学术许可证的形式发布了自监督模型和权重。结论:我们以开源学术许可证的形式发布了自监督模型和权重,然后可以利用有限的特定任务注释数据对这些模型进行微调,以用于各种下游成像任务,从而加速生物医学成像信息学的研究。
Self-supervised learning for chest computed tomography: training strategies and effect on downstream applications.
Purpose: Self-supervised pre-training can reduce the amount of labeled training data needed by pre-learning fundamental visual characteristics of the medical imaging data. We investigate several self-supervised training strategies for chest computed tomography exams and their effects on downstream applications.
Approach: We benchmark five well-known self-supervision strategies (masked image region prediction, next slice prediction, rotation prediction, flip prediction, and denoising) on 15 M chest computed tomography (CT) slices collected from four sites of the Mayo Clinic enterprise, United States. These models were evaluated for two downstream tasks on public datasets: pulmonary embolism (PE) detection (classification) and lung nodule segmentation. Image embeddings generated by these models were also evaluated for prediction of patient age, race, and gender to study inherent biases in models' understanding of chest CT exams.
Results: The use of pre-training weights especially masked region prediction-based weights, improved performance, and reduced computational effort needed for downstream tasks compared with task-specific state-of-the-art (SOTA) models. Performance improvement for PE detection was observed for training dataset sizes as large as with a maximum gain of 5% over SOTA. The segmentation model initialized with pre-training weights learned twice as fast as the randomly initialized model. While gender and age predictors built using self-supervised training weights showed no performance improvement over randomly initialized predictors, the race predictor experienced a 10% performance boost when using self-supervised training weights.
Conclusion: We released self-supervised models and weights under an open-source academic license. These models can then be fine-tuned with limited task-specific annotated data for a variety of downstream imaging tasks, thus accelerating research in biomedical imaging informatics.
期刊介绍:
JMI covers fundamental and translational research, as well as applications, focused on medical imaging, which continue to yield physical and biomedical advancements in the early detection, diagnostics, and therapy of disease as well as in the understanding of normal. The scope of JMI includes: Imaging physics, Tomographic reconstruction algorithms (such as those in CT and MRI), Image processing and deep learning, Computer-aided diagnosis and quantitative image analysis, Visualization and modeling, Picture archiving and communications systems (PACS), Image perception and observer performance, Technology assessment, Ultrasonic imaging, Image-guided procedures, Digital pathology, Biomedical applications of biomedical imaging. JMI allows for the peer-reviewed communication and archiving of scientific developments, translational and clinical applications, reviews, and recommendations for the field.