{"title":"Self-supervised Semantic Segmentation: Consistency over Transformation.","authors":"Sanaz Karimijafarbigloo, Reza Azad, Amirhossein Kazerouni, Yury Velichko, Ulas Bagci, Dorit Merhof","doi":"10.1109/ICCVW60793.2023.00280","DOIUrl":null,"url":null,"abstract":"<p><p>Accurate medical image segmentation is of utmost importance for enabling automated clinical decision procedures. However, prevailing supervised deep learning approaches for medical image segmentation encounter significant challenges due to their heavy dependence on extensive labeled training data. To tackle this issue, we propose a novel self-supervised algorithm, <math><mrow><mrow><msup><mi>S</mi><mn>3</mn></msup></mrow><mo>-</mo><mi>Net</mi></mrow></math>, which integrates a robust framework based on the proposed Inception Large Kernel Attention (I-LKA) modules. This architectural enhancement makes it possible to comprehensively capture contextual information while preserving local intricacies, thereby enabling precise semantic segmentation. Furthermore, considering that lesions in medical images often exhibit deformations, we leverage deformable convolution as an integral component to effectively capture and delineate lesion deformations for superior object boundary definition. Additionally, our self-supervised strategy emphasizes the acquisition of invariance to affine transformations, which is commonly encountered in medical scenarios. This emphasis on robustness with respect to geometric distortions significantly enhances the model's ability to accurately model and handle such distortions. To enforce spatial consistency and promote the grouping of spatially connected image pixels with similar feature representations, we introduce a spatial consistency loss term. This aids the network in effectively capturing the relationships among neighboring pixels and enhancing the overall segmentation quality. The <math><mrow><mrow><msup><mi>S</mi><mn>3</mn></msup></mrow><mo>-</mo><mi>N</mi><mi>e</mi><mi>t</mi></mrow></math> approach iteratively learns pixel-level feature representations for image content clustering in an end-to-end manner. Our experimental results on skin lesion and lung organ segmentation tasks show the superior performance of our method compared to the SOTA approaches.</p>","PeriodicalId":72022,"journal":{"name":"... IEEE International Conference on Computer Vision workshops. IEEE International Conference on Computer Vision","volume":"2023 ","pages":"2646-2655"},"PeriodicalIF":0.0000,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10829429/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"... IEEE International Conference on Computer Vision workshops. IEEE International Conference on Computer Vision","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCVW60793.2023.00280","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/12/25 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Accurate medical image segmentation is of utmost importance for enabling automated clinical decision procedures. However, prevailing supervised deep learning approaches for medical image segmentation encounter significant challenges due to their heavy dependence on extensive labeled training data. To tackle this issue, we propose a novel self-supervised algorithm, , which integrates a robust framework based on the proposed Inception Large Kernel Attention (I-LKA) modules. This architectural enhancement makes it possible to comprehensively capture contextual information while preserving local intricacies, thereby enabling precise semantic segmentation. Furthermore, considering that lesions in medical images often exhibit deformations, we leverage deformable convolution as an integral component to effectively capture and delineate lesion deformations for superior object boundary definition. Additionally, our self-supervised strategy emphasizes the acquisition of invariance to affine transformations, which is commonly encountered in medical scenarios. This emphasis on robustness with respect to geometric distortions significantly enhances the model's ability to accurately model and handle such distortions. To enforce spatial consistency and promote the grouping of spatially connected image pixels with similar feature representations, we introduce a spatial consistency loss term. This aids the network in effectively capturing the relationships among neighboring pixels and enhancing the overall segmentation quality. The approach iteratively learns pixel-level feature representations for image content clustering in an end-to-end manner. Our experimental results on skin lesion and lung organ segmentation tasks show the superior performance of our method compared to the SOTA approaches.
准确的医学图像分割对于实现自动化临床决策程序至关重要。然而,目前用于医学图像分割的有监督深度学习方法由于严重依赖大量标注的训练数据而面临巨大挑战。为了解决这个问题,我们提出了一种新颖的自监督算法 S3-Net,它集成了一个基于所提出的 Inception Large Kernel Attention(I-LKA)模块的稳健框架。这种架构上的改进使得在保留局部复杂性的同时全面捕捉上下文信息成为可能,从而实现精确的语义分割。此外,考虑到医学图像中的病变通常会发生形变,我们将可变形卷积作为一个不可或缺的组成部分,有效捕捉和划分病变形变,从而实现出色的对象边界定义。此外,我们的自监督策略强调获得仿射变换的不变性,这在医学场景中很常见。这种对几何失真的鲁棒性强调大大增强了模型准确建模和处理此类失真的能力。为了加强空间一致性,促进具有相似特征表示的空间连接图像像素的分组,我们引入了空间一致性损失项。这有助于网络有效捕捉相邻像素之间的关系,提高整体分割质量。S3-Net 方法以端到端的方式迭代学习像素级特征表示,用于图像内容聚类。我们在皮肤病变和肺部器官分割任务上的实验结果表明,与 SOTA 方法相比,我们的方法性能更优。