{"title":"Real-time Semantic Segmentation with Bilateral Patch Attention","authors":"Minseok Kang, Minhyeok Lee, Sangyoun Lee","doi":"10.1109/ICEIC61013.2024.10457099","DOIUrl":null,"url":null,"abstract":"Semantic segmentation, a fundamental task in computer vision, has evolved significantly with the introduction of deep learning techniques, particularly fully convolutional networks (FCNs). In the context of real-time semantic segmentation, the demand for efficient yet accurate models has grown, particularly for resource-constrained devices. Recent advancements have explored the fusion of global context and local details through bidirectional network structures, exemplified by BiSeNet, STDCNet, and DDRNet. However, issues of pixel inconsistency within the same object classification persist. Attention-based models like SETR and SegFormer have shown promise in mitigating this issue by capturing intricate spatial dependencies. This paper introduces the concept of ‘Mask Disarrange’ and proposes a lightweight attention mechanism suitable for real-time semantic segmentation. The Cross Patch Attention (CPA) and Inter Patch Attention (IPA) methods are presented, addressing fusion and mask disarrange challenges while maintaining computational efficiency. Experimental results on the Cityscapes dataset demonstrate the effectiveness of the proposed Bilateral Patch-Net (BPNet) in achieving superior segmentation performance and increased frames per second (FPS) compared to the state-of-the-art PIDNet. BPNet's contributions lie in its simplicity, efficiency, and applicability to diverse domains, offering potential for broader adoption in computer vision applications.","PeriodicalId":518726,"journal":{"name":"2024 International Conference on Electronics, Information, and Communication (ICEIC)","volume":"230 4","pages":"1-4"},"PeriodicalIF":0.0000,"publicationDate":"2024-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2024 International Conference on Electronics, Information, and Communication (ICEIC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICEIC61013.2024.10457099","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Semantic segmentation, a fundamental task in computer vision, has evolved significantly with the introduction of deep learning techniques, particularly fully convolutional networks (FCNs). In the context of real-time semantic segmentation, the demand for efficient yet accurate models has grown, particularly for resource-constrained devices. Recent advancements have explored the fusion of global context and local details through bidirectional network structures, exemplified by BiSeNet, STDCNet, and DDRNet. However, issues of pixel inconsistency within the same object classification persist. Attention-based models like SETR and SegFormer have shown promise in mitigating this issue by capturing intricate spatial dependencies. This paper introduces the concept of ‘Mask Disarrange’ and proposes a lightweight attention mechanism suitable for real-time semantic segmentation. The Cross Patch Attention (CPA) and Inter Patch Attention (IPA) methods are presented, addressing fusion and mask disarrange challenges while maintaining computational efficiency. Experimental results on the Cityscapes dataset demonstrate the effectiveness of the proposed Bilateral Patch-Net (BPNet) in achieving superior segmentation performance and increased frames per second (FPS) compared to the state-of-the-art PIDNet. BPNet's contributions lie in its simplicity, efficiency, and applicability to diverse domains, offering potential for broader adoption in computer vision applications.