DSC-Net: Enhancing Blind Road Semantic Segmentation with Visual Sensor Using a Dual-Branch Swin-CNN Architecture.

IF 3.4 3区综合性期刊 Q2 CHEMISTRY, ANALYTICAL Sensors Pub Date : 2024-09-20 DOI:10.3390/s24186075

Ying Yuan, Yu Du, Yan Ma, Hejun Lv

{"title":"DSC-Net: Enhancing Blind Road Semantic Segmentation with Visual Sensor Using a Dual-Branch Swin-CNN Architecture.","authors":"Ying Yuan, Yu Du, Yan Ma, Hejun Lv","doi":"10.3390/s24186075","DOIUrl":null,"url":null,"abstract":"<p><p>In modern urban environments, visual sensors are crucial for enhancing the functionality of navigation systems, particularly for devices designed for visually impaired individuals. The high-resolution images captured by these sensors form the basis for understanding the surrounding environment and identifying key landmarks. However, the core challenge in the semantic segmentation of blind roads lies in the effective extraction of global context and edge features. Most existing methods rely on Convolutional Neural Networks (CNNs), whose inherent inductive biases limit their ability to capture global context and accurately detect discontinuous features such as gaps and obstructions in blind roads. To overcome these limitations, we introduce Dual-Branch Swin-CNN Net(DSC-Net), a new method that integrates the global modeling capabilities of the Swin-Transformer with the CNN-based U-Net architecture. This combination allows for the hierarchical extraction of both fine and coarse features. First, the Spatial Blending Module (SBM) mitigates blurring of target information caused by object occlusion to enhance accuracy. The hybrid attention module (HAM), embedded within the Inverted Residual Module (IRM), sharpens the detection of blind road boundaries, while the IRM improves the speed of network processing. In tests on a specialized dataset designed for blind road semantic segmentation in real-world scenarios, our method achieved an impressive mIoU of 97.72%. Additionally, it demonstrated exceptional performance on other public datasets.</p>","PeriodicalId":21698,"journal":{"name":"Sensors","volume":null,"pages":null},"PeriodicalIF":3.4000,"publicationDate":"2024-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11435784/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Sensors","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.3390/s24186075","RegionNum":3,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CHEMISTRY, ANALYTICAL","Score":null,"Total":0}

引用次数: 0

Abstract

In modern urban environments, visual sensors are crucial for enhancing the functionality of navigation systems, particularly for devices designed for visually impaired individuals. The high-resolution images captured by these sensors form the basis for understanding the surrounding environment and identifying key landmarks. However, the core challenge in the semantic segmentation of blind roads lies in the effective extraction of global context and edge features. Most existing methods rely on Convolutional Neural Networks (CNNs), whose inherent inductive biases limit their ability to capture global context and accurately detect discontinuous features such as gaps and obstructions in blind roads. To overcome these limitations, we introduce Dual-Branch Swin-CNN Net(DSC-Net), a new method that integrates the global modeling capabilities of the Swin-Transformer with the CNN-based U-Net architecture. This combination allows for the hierarchical extraction of both fine and coarse features. First, the Spatial Blending Module (SBM) mitigates blurring of target information caused by object occlusion to enhance accuracy. The hybrid attention module (HAM), embedded within the Inverted Residual Module (IRM), sharpens the detection of blind road boundaries, while the IRM improves the speed of network processing. In tests on a specialized dataset designed for blind road semantic segmentation in real-world scenarios, our method achieved an impressive mIoU of 97.72%. Additionally, it demonstrated exceptional performance on other public datasets.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

DSC-Net：利用双分支 Swin-CNN 架构增强视觉传感器的盲道语义分割功能

在现代城市环境中，视觉传感器对于增强导航系统的功能至关重要，尤其是针对视障人士设计的设备。这些传感器捕捉到的高分辨率图像是了解周围环境和识别关键地标的基础。然而，盲道语义分割的核心挑战在于有效提取全局上下文和边缘特征。现有的大多数方法都依赖于卷积神经网络（CNN），其固有的归纳偏差限制了其捕捉全局上下文和准确检测盲道中间隙和障碍物等不连续特征的能力。为了克服这些局限性，我们引入了双分支斯温-CNN 网（DSC-Net），这是一种将斯温变换器的全局建模能力与基于 CNN 的 U-Net 架构相结合的新方法。这种组合可分层提取精细和粗略特征。首先，空间混合模块（SBM）可减轻物体遮挡造成的目标信息模糊，从而提高准确性。嵌入反转残差模块（IRM）中的混合注意力模块（HAM）可使道路盲区边界的检测更加清晰，而 IRM 则可提高网络处理速度。在针对实际场景中的盲道语义分割而设计的专门数据集测试中，我们的方法取得了令人印象深刻的 97.72% 的 mIoU。此外，它在其他公共数据集上也表现出了卓越的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Sensors 工程技术-电化学

CiteScore

7.30

自引率

12.80%

发文量

8430

审稿时长

1.7 months

期刊介绍： Sensors (ISSN 1424-8220) provides an advanced forum for the science and technology of sensors and biosensors. It publishes reviews (including comprehensive reviews on the complete sensors products), regular research papers and short notes. Our aim is to encourage scientists to publish their experimental and theoretical results in as much detail as possible. There is no restriction on the length of the papers. The full experimental details must be provided so that the results can be reproduced.