Enhanced multi-scale networks for semantic segmentation

IF 5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Complex & Intelligent Systems Pub Date : 2023-12-04 DOI:10.1007/s40747-023-01279-x
Tianping Li, Zhaotong Cui, Yu Han, Guanxing Li, Meng Li, Dongmei Wei
{"title":"Enhanced multi-scale networks for semantic segmentation","authors":"Tianping Li, Zhaotong Cui, Yu Han, Guanxing Li, Meng Li, Dongmei Wei","doi":"10.1007/s40747-023-01279-x","DOIUrl":null,"url":null,"abstract":"<p>Multi-scale representation provides an effective answer to the scale variation of objects and entities in semantic segmentation. The ability to capture long-range pixel dependency facilitates semantic segmentation. In addition, semantic segmentation necessitates the effective use of pixel-to-pixel similarity in the channel direction to enhance pixel areas. By reviewing the characteristics of earlier successful segmentation models, we discover a number of crucial elements that enhance segmentation model performance, including a robust encoder structure, multi-scale interactions, attention mechanisms, and a robust decoder structure. The attention mechanism of the asymmetric non-local neural network (ANNet) is merged with multi-scale pyramidal modules to accelerate model segmentation while maintaining high accuracy. However, ANNet does not account for the similarity between pixels in the feature map channel direction, making the segmentation accuracy unsatisfactory. As a result, we propose EMSNet, a straightforward convolutional network architecture for semantic segmentation that consists of Integration of enhanced regional module (IERM) and Multi-scale convolution module (MSCM). The IERM module generates weights using four or five-stage feature maps, then fuses the input features with the weights and uses more computation. The similarity of the channel direction feature graphs is also calculated using ANNet’s auxiliary loss function. The MSCM module can more accurately describe the interactions between various channels, capture the interdependencies between feature pixels, and capture the multi-scale context. Experiments prove that we perform well in tests using the benchmark dataset. On Cityscapes test data, we get 82.2% segmentation accuracy. The mIoU in the ADE20k and Pascal VOC datasets are, respectively, 45.58% and 85.46%.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":" 26","pages":""},"PeriodicalIF":5.0000,"publicationDate":"2023-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Complex & Intelligent Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s40747-023-01279-x","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Multi-scale representation provides an effective answer to the scale variation of objects and entities in semantic segmentation. The ability to capture long-range pixel dependency facilitates semantic segmentation. In addition, semantic segmentation necessitates the effective use of pixel-to-pixel similarity in the channel direction to enhance pixel areas. By reviewing the characteristics of earlier successful segmentation models, we discover a number of crucial elements that enhance segmentation model performance, including a robust encoder structure, multi-scale interactions, attention mechanisms, and a robust decoder structure. The attention mechanism of the asymmetric non-local neural network (ANNet) is merged with multi-scale pyramidal modules to accelerate model segmentation while maintaining high accuracy. However, ANNet does not account for the similarity between pixels in the feature map channel direction, making the segmentation accuracy unsatisfactory. As a result, we propose EMSNet, a straightforward convolutional network architecture for semantic segmentation that consists of Integration of enhanced regional module (IERM) and Multi-scale convolution module (MSCM). The IERM module generates weights using four or five-stage feature maps, then fuses the input features with the weights and uses more computation. The similarity of the channel direction feature graphs is also calculated using ANNet’s auxiliary loss function. The MSCM module can more accurately describe the interactions between various channels, capture the interdependencies between feature pixels, and capture the multi-scale context. Experiments prove that we perform well in tests using the benchmark dataset. On Cityscapes test data, we get 82.2% segmentation accuracy. The mIoU in the ADE20k and Pascal VOC datasets are, respectively, 45.58% and 85.46%.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
增强的多尺度语义分割网络
多尺度表示为语义分割中对象和实体的尺度变化提供了有效的解决方案。捕获远程像素依赖性的能力有助于语义分割。此外,语义分割需要在信道方向上有效利用像素间的相似性来增强像素区域。通过回顾早期成功的分割模型的特征,我们发现了一些提高分割模型性能的关键因素,包括鲁棒编码器结构、多尺度交互、注意机制和鲁棒解码器结构。将非对称非局部神经网络(ANNet)的注意机制与多尺度锥体模块相结合,在保持较高分割精度的同时加快了模型分割速度。然而,ANNet没有考虑到特征映射通道方向上像素之间的相似性,使得分割精度不理想。因此,我们提出了EMSNet,一种用于语义分割的简单卷积网络架构,由增强区域模块(IERM)和多尺度卷积模块(MSCM)的集成组成。IERM模块使用四或五阶段特征映射生成权重,然后将输入特征与权重融合,并使用更多的计算量。利用ANNet的辅助损失函数计算信道方向特征图的相似度。MSCM模块可以更准确地描述各通道之间的相互作用,捕获特征像素之间的相互依赖关系,并捕获多尺度上下文。实验证明我们在使用基准数据集的测试中表现良好。在cityscape测试数据上,我们得到了82.2%的分割准确率。ADE20k和Pascal VOC数据集的mIoU分别为45.58%和85.46%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Complex & Intelligent Systems
Complex & Intelligent Systems COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-
CiteScore
9.60
自引率
10.30%
发文量
297
期刊介绍: Complex & Intelligent Systems aims to provide a forum for presenting and discussing novel approaches, tools and techniques meant for attaining a cross-fertilization between the broad fields of complex systems, computational simulation, and intelligent analytics and visualization. The transdisciplinary research that the journal focuses on will expand the boundaries of our understanding by investigating the principles and processes that underlie many of the most profound problems facing society today.
期刊最新文献
Large-scale multiobjective competitive swarm optimizer algorithm based on regional multidirectional search Towards fairness-aware multi-objective optimization Low-frequency spectral graph convolution networks with one-hop connections information for personalized tag recommendation A decentralized feedback-based consensus model considering the consistency maintenance and readability of probabilistic linguistic preference relations for large-scale group decision-making A dynamic preference recommendation model based on spatiotemporal knowledge graphs
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1