Parallel segmentation network for real-time semantic segmentation

IF 8 2区 计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS Engineering Applications of Artificial Intelligence Pub Date : 2025-03-11 DOI:10.1016/j.engappai.2025.110487
Guanke Chen , Haibin Li , Yaqian Li , Wenming Zhang , Tao Song
{"title":"Parallel segmentation network for real-time semantic segmentation","authors":"Guanke Chen ,&nbsp;Haibin Li ,&nbsp;Yaqian Li ,&nbsp;Wenming Zhang ,&nbsp;Tao Song","doi":"10.1016/j.engappai.2025.110487","DOIUrl":null,"url":null,"abstract":"<div><div>Real-time semantic segmentation holds extensive application prospects in autonomous driving and robot navigation. Recently, real-time semantic segmentation networks mainly adopt encoder-decoder architecture and multi-branch architecture. However, both approaches have their own advantages and limitations. Encoder-decoder models are generally better at extracting contextual information, but may face challenges in capturing fine details and local spatial information. On the other hand, the multi-branch structure excels at capturing boundary and spatial detail information, but it requires an efficient and flexible feature fusion strategy to prevent information redundancy. To leverage the strengths of both approaches, we propose a Parallel Segmentation Network (PaSeNet) which adopts the unsymmetrical encoder-decoder structure to introduce novel ideas for research and applications in real-time semantic segmentation. Specifically, we design a main branch with a spatial information enhancement path during the encoding phase and introduce mask autoencoder based on self-supervised learning as an auxiliary branch to supplement the main branch in extracting details as well as local spatial information. Additionally, we propose the Grouped Aggregation Pyramid Pooling Module to optimize the extraction of contextual information. In the decoding phase, we introduce the Coordinate-Attention-Guided Decoder to effectively integrate diverse information from different branches. A large number of experiments on the Cityscapes, Cambridge-driving Labeled Video database (CamVid), NightCity and instance Segmentation in Aerial Images Dataset demonstrate that our method achieves competitive results. Specifically, PaSeNet-Base obtains 79.9% mean Intersection Over Union (mIOU) at 55.6 Frames Per Second (FPS) on Cityscapes test dataset and 80.2% mIOU at 96.8 FPS on CamVid test dataset.</div></div>","PeriodicalId":50523,"journal":{"name":"Engineering Applications of Artificial Intelligence","volume":"148 ","pages":"Article 110487"},"PeriodicalIF":8.0000,"publicationDate":"2025-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Engineering Applications of Artificial Intelligence","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0952197625004877","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Real-time semantic segmentation holds extensive application prospects in autonomous driving and robot navigation. Recently, real-time semantic segmentation networks mainly adopt encoder-decoder architecture and multi-branch architecture. However, both approaches have their own advantages and limitations. Encoder-decoder models are generally better at extracting contextual information, but may face challenges in capturing fine details and local spatial information. On the other hand, the multi-branch structure excels at capturing boundary and spatial detail information, but it requires an efficient and flexible feature fusion strategy to prevent information redundancy. To leverage the strengths of both approaches, we propose a Parallel Segmentation Network (PaSeNet) which adopts the unsymmetrical encoder-decoder structure to introduce novel ideas for research and applications in real-time semantic segmentation. Specifically, we design a main branch with a spatial information enhancement path during the encoding phase and introduce mask autoencoder based on self-supervised learning as an auxiliary branch to supplement the main branch in extracting details as well as local spatial information. Additionally, we propose the Grouped Aggregation Pyramid Pooling Module to optimize the extraction of contextual information. In the decoding phase, we introduce the Coordinate-Attention-Guided Decoder to effectively integrate diverse information from different branches. A large number of experiments on the Cityscapes, Cambridge-driving Labeled Video database (CamVid), NightCity and instance Segmentation in Aerial Images Dataset demonstrate that our method achieves competitive results. Specifically, PaSeNet-Base obtains 79.9% mean Intersection Over Union (mIOU) at 55.6 Frames Per Second (FPS) on Cityscapes test dataset and 80.2% mIOU at 96.8 FPS on CamVid test dataset.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
面向实时语义分割的并行分割网络
实时语义分割在自动驾驶和机器人导航领域具有广泛的应用前景。目前,实时语义分割网络主要采用编码器-解码器结构和多分支结构。然而,这两种方法都有各自的优点和局限性。编码器-解码器模型通常更擅长提取上下文信息,但在捕获精细细节和局部空间信息方面可能面临挑战。另一方面,多分支结构在捕获边界和空间细节信息方面具有优势,但需要一种高效灵活的特征融合策略来防止信息冗余。为了利用这两种方法的优势,我们提出了一种采用非对称编码器-解码器结构的并行分割网络(PaSeNet),为实时语义分割的研究和应用引入了新的思路。具体而言,我们在编码阶段设计了具有空间信息增强路径的主分支,并引入基于自监督学习的掩模自编码器作为辅助分支,对主分支进行细节提取和局部空间信息提取的补充。此外,我们提出了分组聚合金字塔池模块来优化上下文信息的提取。在解码阶段,我们引入了坐标-注意引导解码器,有效地整合了来自不同分支的多种信息。在城市景观、剑桥驾驶标记视频数据库(CamVid)、NightCity和航拍图像数据集的实例分割上进行的大量实验表明,我们的方法取得了较好的效果。具体来说,PaSeNet-Base在cityscape测试数据集上以55.6帧/秒(FPS)的速度获得79.9%的平均交汇率(mIOU),在CamVid测试数据集上以96.8帧/秒获得80.2%的平均交汇率(mIOU)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Engineering Applications of Artificial Intelligence
Engineering Applications of Artificial Intelligence 工程技术-工程:电子与电气
CiteScore
9.60
自引率
10.00%
发文量
505
审稿时长
68 days
期刊介绍: Artificial Intelligence (AI) is pivotal in driving the fourth industrial revolution, witnessing remarkable advancements across various machine learning methodologies. AI techniques have become indispensable tools for practicing engineers, enabling them to tackle previously insurmountable challenges. Engineering Applications of Artificial Intelligence serves as a global platform for the swift dissemination of research elucidating the practical application of AI methods across all engineering disciplines. Submitted papers are expected to present novel aspects of AI utilized in real-world engineering applications, validated using publicly available datasets to ensure the replicability of research outcomes. Join us in exploring the transformative potential of AI in engineering.
期刊最新文献
Multiphysics response and internal leakage prediction of seismic hydraulic systems considering structural clearance effects Machine learning-based prediction of ductility of strain-hardening fiber-reinforced cementitious composites Neighborhood constrained attention for lightweight image super-resolution A quantum group decision-making model for patient-capital project selection integrating cumulative prospect theory under linear Diophantine fuzzy uncertainty Forecast-enhanced bilevel real-time pricing for microgrids via hybrid-action reinforcement learning
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1