Zhoushan Feng , Yuliang Zhang , Yanhong Chen , Yiyu Shi , Yu Liu , Wen Sun , Lili Du , Dunjin Chen
{"title":"SwinSAM: Fine-grained polyp segmentation in colonoscopy images via segment anything model integrated with a Swin Transformer decoder","authors":"Zhoushan Feng , Yuliang Zhang , Yanhong Chen , Yiyu Shi , Yu Liu , Wen Sun , Lili Du , Dunjin Chen","doi":"10.1016/j.bspc.2024.107055","DOIUrl":null,"url":null,"abstract":"<div><div>Polyp segmentation in colonoscopy imagery is a critical procedure in the early detection and preemptive management of colorectal cancer. In facilitating the diagnostic procedures, it is pivotal to attain segmentation with high precision, emphasizing fine-grained details which can potentially harbor crucial information regarding the disease state. To address the prevailing demand for more refined segmentation techniques, this study introduces an innovative framework “SwinSAM”, which ingeniously integrates a Swin Transformer decoder with a SAM encoder. The SAM model has seen over a billion images and possesses a strong capability for image comprehension. However, its training data primarily originates from natural images rather than medical ones. Hence, we designed an adapter module to infuse specific medical domain information into SAM. Furthermore, due to the varying sizes and shapes of polyps, along with their high blending degree with the background, the simplistic convolutional decoder in the original SAM model struggles to accurately segment the intricate details of polyps. This prompted us to utilize the Swin Transformer as the decoder. Additionally, considering the significant shape variations of polyps, we employed a multi-scale perception fusion module to process the deep features extracted by SAM. By using convolutions with different receptive fields, we can extract information about polyps of various shapes. Finally, we optimized the network parameters through multi-level supervision. Comprehensive experiments were conducted on five commonly used polyp segmentation datasets. The results validate that our proposed method achieves good performance across datasets with different polyp backgrounds.</div></div>","PeriodicalId":55362,"journal":{"name":"Biomedical Signal Processing and Control","volume":"100 ","pages":"Article 107055"},"PeriodicalIF":4.9000,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biomedical Signal Processing and Control","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1746809424011133","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, BIOMEDICAL","Score":null,"Total":0}
引用次数: 0
Abstract
Polyp segmentation in colonoscopy imagery is a critical procedure in the early detection and preemptive management of colorectal cancer. In facilitating the diagnostic procedures, it is pivotal to attain segmentation with high precision, emphasizing fine-grained details which can potentially harbor crucial information regarding the disease state. To address the prevailing demand for more refined segmentation techniques, this study introduces an innovative framework “SwinSAM”, which ingeniously integrates a Swin Transformer decoder with a SAM encoder. The SAM model has seen over a billion images and possesses a strong capability for image comprehension. However, its training data primarily originates from natural images rather than medical ones. Hence, we designed an adapter module to infuse specific medical domain information into SAM. Furthermore, due to the varying sizes and shapes of polyps, along with their high blending degree with the background, the simplistic convolutional decoder in the original SAM model struggles to accurately segment the intricate details of polyps. This prompted us to utilize the Swin Transformer as the decoder. Additionally, considering the significant shape variations of polyps, we employed a multi-scale perception fusion module to process the deep features extracted by SAM. By using convolutions with different receptive fields, we can extract information about polyps of various shapes. Finally, we optimized the network parameters through multi-level supervision. Comprehensive experiments were conducted on five commonly used polyp segmentation datasets. The results validate that our proposed method achieves good performance across datasets with different polyp backgrounds.
结肠镜成像中的息肉分割是早期检测和预防性治疗结肠直肠癌的关键程序。在促进诊断程序的过程中,关键是要实现高精度的分割,强调细粒度的细节,因为这些细节可能蕴藏着有关疾病状态的关键信息。为了满足对更精细分割技术的普遍需求,本研究引入了一个创新框架 "SwinSAM",它巧妙地将 Swin 变压器解码器与 SAM 编码器集成在一起。SAM 模型已处理过超过十亿幅图像,具有很强的图像理解能力。不过,它的训练数据主要来自自然图像而非医学图像。因此,我们设计了一个适配器模块,为 SAM 注入特定的医学领域信息。此外,由于息肉的大小和形状各不相同,与背景的融合度也很高,原始 SAM 模型中的简单卷积解码器难以准确分割息肉的复杂细节。这促使我们使用斯温变换器作为解码器。此外,考虑到息肉形状的显著变化,我们采用了多尺度感知融合模块来处理 SAM 提取的深度特征。通过使用不同感受野的卷积,我们可以提取各种形状息肉的信息。最后,我们通过多级监督优化了网络参数。我们在五个常用的息肉分割数据集上进行了综合实验。结果验证了我们提出的方法在不同息肉背景的数据集上都能取得良好的性能。
期刊介绍:
Biomedical Signal Processing and Control aims to provide a cross-disciplinary international forum for the interchange of information on research in the measurement and analysis of signals and images in clinical medicine and the biological sciences. Emphasis is placed on contributions dealing with the practical, applications-led research on the use of methods and devices in clinical diagnosis, patient monitoring and management.
Biomedical Signal Processing and Control reflects the main areas in which these methods are being used and developed at the interface of both engineering and clinical science. The scope of the journal is defined to include relevant review papers, technical notes, short communications and letters. Tutorial papers and special issues will also be published.