Parallel segmentation network for real-time semantic segmentation

IF 8 2区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS Engineering Applications of Artificial Intelligence Pub Date : 2025-03-11 DOI:10.1016/j.engappai.2025.110487

Guanke Chen , Haibin Li , Yaqian Li , Wenming Zhang , Tao Song

{"title":"Parallel segmentation network for real-time semantic segmentation","authors":"Guanke Chen , Haibin Li , Yaqian Li , Wenming Zhang , Tao Song","doi":"10.1016/j.engappai.2025.110487","DOIUrl":null,"url":null,"abstract":"<div><div>Real-time semantic segmentation holds extensive application prospects in autonomous driving and robot navigation. Recently, real-time semantic segmentation networks mainly adopt encoder-decoder architecture and multi-branch architecture. However, both approaches have their own advantages and limitations. Encoder-decoder models are generally better at extracting contextual information, but may face challenges in capturing fine details and local spatial information. On the other hand, the multi-branch structure excels at capturing boundary and spatial detail information, but it requires an efficient and flexible feature fusion strategy to prevent information redundancy. To leverage the strengths of both approaches, we propose a Parallel Segmentation Network (PaSeNet) which adopts the unsymmetrical encoder-decoder structure to introduce novel ideas for research and applications in real-time semantic segmentation. Specifically, we design a main branch with a spatial information enhancement path during the encoding phase and introduce mask autoencoder based on self-supervised learning as an auxiliary branch to supplement the main branch in extracting details as well as local spatial information. Additionally, we propose the Grouped Aggregation Pyramid Pooling Module to optimize the extraction of contextual information. In the decoding phase, we introduce the Coordinate-Attention-Guided Decoder to effectively integrate diverse information from different branches. A large number of experiments on the Cityscapes, Cambridge-driving Labeled Video database (CamVid), NightCity and instance Segmentation in Aerial Images Dataset demonstrate that our method achieves competitive results. Specifically, PaSeNet-Base obtains 79.9% mean Intersection Over Union (mIOU) at 55.6 Frames Per Second (FPS) on Cityscapes test dataset and 80.2% mIOU at 96.8 FPS on CamVid test dataset.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"148 ","pages":"Article 110487"},"PeriodicalIF":8.0000,"publicationDate":"2025-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Engineering Applications of Artificial Intelligence","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0952197625004877","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Real-time semantic segmentation holds extensive application prospects in autonomous driving and robot navigation. Recently, real-time semantic segmentation networks mainly adopt encoder-decoder architecture and multi-branch architecture. However, both approaches have their own advantages and limitations. Encoder-decoder models are generally better at extracting contextual information, but may face challenges in capturing fine details and local spatial information. On the other hand, the multi-branch structure excels at capturing boundary and spatial detail information, but it requires an efficient and flexible feature fusion strategy to prevent information redundancy. To leverage the strengths of both approaches, we propose a Parallel Segmentation Network (PaSeNet) which adopts the unsymmetrical encoder-decoder structure to introduce novel ideas for research and applications in real-time semantic segmentation. Specifically, we design a main branch with a spatial information enhancement path during the encoding phase and introduce mask autoencoder based on self-supervised learning as an auxiliary branch to supplement the main branch in extracting details as well as local spatial information. Additionally, we propose the Grouped Aggregation Pyramid Pooling Module to optimize the extraction of contextual information. In the decoding phase, we introduce the Coordinate-Attention-Guided Decoder to effectively integrate diverse information from different branches. A large number of experiments on the Cityscapes, Cambridge-driving Labeled Video database (CamVid), NightCity and instance Segmentation in Aerial Images Dataset demonstrate that our method achieves competitive results. Specifically, PaSeNet-Base obtains 79.9% mean Intersection Over Union (mIOU) at 55.6 Frames Per Second (FPS) on Cityscapes test dataset and 80.2% mIOU at 96.8 FPS on CamVid test dataset.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

面向实时语义分割的并行分割网络

实时语义分割在自动驾驶和机器人导航领域具有广泛的应用前景。目前，实时语义分割网络主要采用编码器-解码器结构和多分支结构。然而，这两种方法都有各自的优点和局限性。编码器-解码器模型通常更擅长提取上下文信息，但在捕获精细细节和局部空间信息方面可能面临挑战。另一方面，多分支结构在捕获边界和空间细节信息方面具有优势，但需要一种高效灵活的特征融合策略来防止信息冗余。为了利用这两种方法的优势，我们提出了一种采用非对称编码器-解码器结构的并行分割网络（PaSeNet），为实时语义分割的研究和应用引入了新的思路。具体而言，我们在编码阶段设计了具有空间信息增强路径的主分支，并引入基于自监督学习的掩模自编码器作为辅助分支，对主分支进行细节提取和局部空间信息提取的补充。此外，我们提出了分组聚合金字塔池模块来优化上下文信息的提取。在解码阶段，我们引入了坐标-注意引导解码器，有效地整合了来自不同分支的多种信息。在城市景观、剑桥驾驶标记视频数据库（CamVid）、NightCity和航拍图像数据集的实例分割上进行的大量实验表明，我们的方法取得了较好的效果。具体来说，PaSeNet-Base在cityscape测试数据集上以55.6帧/秒（FPS）的速度获得79.9%的平均交汇率（mIOU），在CamVid测试数据集上以96.8帧/秒获得80.2%的平均交汇率（mIOU）。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Engineering Applications of Artificial Intelligence 工程技术-工程：电子与电气

CiteScore

9.60

自引率

10.00%

发文量

505

审稿时长

68 days

期刊介绍： Artificial Intelligence (AI) is pivotal in driving the fourth industrial revolution, witnessing remarkable advancements across various machine learning methodologies. AI techniques have become indispensable tools for practicing engineers, enabling them to tackle previously insurmountable challenges. Engineering Applications of Artificial Intelligence serves as a global platform for the swift dissemination of research elucidating the practical application of AI methods across all engineering disciplines. Submitted papers are expected to present novel aspects of AI utilized in real-world engineering applications, validated using publicly available datasets to ensure the replicability of research outcomes. Join us in exploring the transformative potential of AI in engineering.