Qingbin Wang, Yuxuan Xiong, Hanfeng Zhu, Xuefeng Mu, Yan Zhang, Yutao Ma
{"title":"使用带有 Swin 变换器的对比遮蔽自动编码器进行颈椎 OCT 图像分类。","authors":"Qingbin Wang, Yuxuan Xiong, Hanfeng Zhu, Xuefeng Mu, Yan Zhang, Yutao Ma","doi":"10.1016/j.compmedimag.2024.102469","DOIUrl":null,"url":null,"abstract":"<p><strong>Background and objective: </strong>Cervical cancer poses a major health threat to women globally. Optical coherence tomography (OCT) imaging has recently shown promise for non-invasive cervical lesion diagnosis. However, obtaining high-quality labeled cervical OCT images is challenging and time-consuming as they must correspond precisely with pathological results. The scarcity of such high-quality labeled data hinders the application of supervised deep-learning models in practical clinical settings. This study addresses the above challenge by proposing CMSwin, a novel self-supervised learning (SSL) framework combining masked image modeling (MIM) with contrastive learning based on the Swin-Transformer architecture to utilize abundant unlabeled cervical OCT images.</p><p><strong>Methods: </strong>In this contrastive-MIM framework, mixed image encoding is combined with a latent contextual regressor to solve the inconsistency problem between pre-training and fine-tuning and separate the encoder's feature extraction task from the decoder's reconstruction task, allowing the encoder to extract better image representations. Besides, contrastive losses at the patch and image levels are elaborately designed to leverage massive unlabeled data.</p><p><strong>Results: </strong>We validated the superiority of CMSwin over the state-of-the-art SSL approaches with five-fold cross-validation on an OCT image dataset containing 1,452 patients from a multi-center clinical study in China, plus two external validation sets from top-ranked Chinese hospitals: the Huaxi dataset from the West China Hospital of Sichuan University and the Xiangya dataset from the Xiangya Second Hospital of Central South University. A human-machine comparison experiment on the Huaxi and Xiangya datasets for volume-level binary classification also indicates that CMSwin can match or exceed the average level of four skilled medical experts, especially in identifying high-risk cervical lesions.</p><p><strong>Conclusion: </strong>Our work has great potential to assist gynecologists in intelligently interpreting cervical OCT images in clinical settings. Additionally, the integrated GradCAM module of CMSwin enables cervical lesion visualization and interpretation, providing good interpretability for gynecologists to diagnose cervical diseases efficiently.</p>","PeriodicalId":50631,"journal":{"name":"Computerized Medical Imaging and Graphics","volume":"118 ","pages":"102469"},"PeriodicalIF":5.4000,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Cervical OCT image classification using contrastive masked autoencoders with Swin Transformer.\",\"authors\":\"Qingbin Wang, Yuxuan Xiong, Hanfeng Zhu, Xuefeng Mu, Yan Zhang, Yutao Ma\",\"doi\":\"10.1016/j.compmedimag.2024.102469\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background and objective: </strong>Cervical cancer poses a major health threat to women globally. Optical coherence tomography (OCT) imaging has recently shown promise for non-invasive cervical lesion diagnosis. However, obtaining high-quality labeled cervical OCT images is challenging and time-consuming as they must correspond precisely with pathological results. The scarcity of such high-quality labeled data hinders the application of supervised deep-learning models in practical clinical settings. This study addresses the above challenge by proposing CMSwin, a novel self-supervised learning (SSL) framework combining masked image modeling (MIM) with contrastive learning based on the Swin-Transformer architecture to utilize abundant unlabeled cervical OCT images.</p><p><strong>Methods: </strong>In this contrastive-MIM framework, mixed image encoding is combined with a latent contextual regressor to solve the inconsistency problem between pre-training and fine-tuning and separate the encoder's feature extraction task from the decoder's reconstruction task, allowing the encoder to extract better image representations. Besides, contrastive losses at the patch and image levels are elaborately designed to leverage massive unlabeled data.</p><p><strong>Results: </strong>We validated the superiority of CMSwin over the state-of-the-art SSL approaches with five-fold cross-validation on an OCT image dataset containing 1,452 patients from a multi-center clinical study in China, plus two external validation sets from top-ranked Chinese hospitals: the Huaxi dataset from the West China Hospital of Sichuan University and the Xiangya dataset from the Xiangya Second Hospital of Central South University. A human-machine comparison experiment on the Huaxi and Xiangya datasets for volume-level binary classification also indicates that CMSwin can match or exceed the average level of four skilled medical experts, especially in identifying high-risk cervical lesions.</p><p><strong>Conclusion: </strong>Our work has great potential to assist gynecologists in intelligently interpreting cervical OCT images in clinical settings. Additionally, the integrated GradCAM module of CMSwin enables cervical lesion visualization and interpretation, providing good interpretability for gynecologists to diagnose cervical diseases efficiently.</p>\",\"PeriodicalId\":50631,\"journal\":{\"name\":\"Computerized Medical Imaging and Graphics\",\"volume\":\"118 \",\"pages\":\"102469\"},\"PeriodicalIF\":5.4000,\"publicationDate\":\"2024-11-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computerized Medical Imaging and Graphics\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://doi.org/10.1016/j.compmedimag.2024.102469\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, BIOMEDICAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computerized Medical Imaging and Graphics","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1016/j.compmedimag.2024.102469","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, BIOMEDICAL","Score":null,"Total":0}
引用次数: 0
摘要
背景和目的:宫颈癌对全球妇女的健康构成重大威胁。最近,光学相干断层扫描(OCT)成像技术在无创宫颈病变诊断方面显示出了良好的前景。然而,获得高质量的标记宫颈 OCT 图像具有挑战性且耗时,因为这些图像必须与病理结果精确对应。这种高质量标记数据的缺乏阻碍了有监督深度学习模型在实际临床环境中的应用。为了应对上述挑战,本研究提出了一种新颖的自我监督学习(SSL)框架--CMSwin,该框架结合了掩蔽图像建模(MIM)和基于 Swin-Transformer 架构的对比学习,以利用丰富的未标记颈椎 OCT 图像:在该对比-MIM 框架中,混合图像编码与潜在上下文回归器相结合,解决了预训练与微调之间的不一致问题,并将编码器的特征提取任务与解码器的重建任务分开,从而使编码器能够提取更好的图像表征。此外,我们还精心设计了补丁和图像层面的对比损失,以充分利用大量无标记数据:我们在一个包含来自中国多中心临床研究的 1,452 名患者的 OCT 图像数据集,以及两个来自中国顶级医院的外部验证集(四川大学华西医院的华西数据集和中南大学湘雅二医院的湘雅数据集)上进行了五倍交叉验证,验证了 CMSwin 优于最先进的 SSL 方法。在华西和湘雅数据集上进行的体积级二元分类的人机对比实验也表明,CMSwin 可以达到或超过四位技术熟练的医学专家的平均水平,尤其是在识别高危宫颈病变方面:我们的工作在帮助妇科医生在临床环境中智能解读宫颈 OCT 图像方面具有巨大潜力。此外,CMSwin 集成的 GradCAM 模块可实现宫颈病变的可视化和解读,为妇科医生有效诊断宫颈疾病提供了良好的可解读性。
Cervical OCT image classification using contrastive masked autoencoders with Swin Transformer.
Background and objective: Cervical cancer poses a major health threat to women globally. Optical coherence tomography (OCT) imaging has recently shown promise for non-invasive cervical lesion diagnosis. However, obtaining high-quality labeled cervical OCT images is challenging and time-consuming as they must correspond precisely with pathological results. The scarcity of such high-quality labeled data hinders the application of supervised deep-learning models in practical clinical settings. This study addresses the above challenge by proposing CMSwin, a novel self-supervised learning (SSL) framework combining masked image modeling (MIM) with contrastive learning based on the Swin-Transformer architecture to utilize abundant unlabeled cervical OCT images.
Methods: In this contrastive-MIM framework, mixed image encoding is combined with a latent contextual regressor to solve the inconsistency problem between pre-training and fine-tuning and separate the encoder's feature extraction task from the decoder's reconstruction task, allowing the encoder to extract better image representations. Besides, contrastive losses at the patch and image levels are elaborately designed to leverage massive unlabeled data.
Results: We validated the superiority of CMSwin over the state-of-the-art SSL approaches with five-fold cross-validation on an OCT image dataset containing 1,452 patients from a multi-center clinical study in China, plus two external validation sets from top-ranked Chinese hospitals: the Huaxi dataset from the West China Hospital of Sichuan University and the Xiangya dataset from the Xiangya Second Hospital of Central South University. A human-machine comparison experiment on the Huaxi and Xiangya datasets for volume-level binary classification also indicates that CMSwin can match or exceed the average level of four skilled medical experts, especially in identifying high-risk cervical lesions.
Conclusion: Our work has great potential to assist gynecologists in intelligently interpreting cervical OCT images in clinical settings. Additionally, the integrated GradCAM module of CMSwin enables cervical lesion visualization and interpretation, providing good interpretability for gynecologists to diagnose cervical diseases efficiently.
期刊介绍:
The purpose of the journal Computerized Medical Imaging and Graphics is to act as a source for the exchange of research results concerning algorithmic advances, development, and application of digital imaging in disease detection, diagnosis, intervention, prevention, precision medicine, and population health. Included in the journal will be articles on novel computerized imaging or visualization techniques, including artificial intelligence and machine learning, augmented reality for surgical planning and guidance, big biomedical data visualization, computer-aided diagnosis, computerized-robotic surgery, image-guided therapy, imaging scanning and reconstruction, mobile and tele-imaging, radiomics, and imaging integration and modeling with other information relevant to digital health. The types of biomedical imaging include: magnetic resonance, computed tomography, ultrasound, nuclear medicine, X-ray, microwave, optical and multi-photon microscopy, video and sensory imaging, and the convergence of biomedical images with other non-imaging datasets.