Enhanced multi-scale networks for semantic segmentation

IF 4.6 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Complex & Intelligent Systems Pub Date : 2023-12-04 DOI:10.1007/s40747-023-01279-x

Tianping Li, Zhaotong Cui, Yu Han, Guanxing Li, Meng Li, Dongmei Wei

{"title":"Enhanced multi-scale networks for semantic segmentation","authors":"Tianping Li, Zhaotong Cui, Yu Han, Guanxing Li, Meng Li, Dongmei Wei","doi":"10.1007/s40747-023-01279-x","DOIUrl":null,"url":null,"abstract":"<p>Multi-scale representation provides an effective answer to the scale variation of objects and entities in semantic segmentation. The ability to capture long-range pixel dependency facilitates semantic segmentation. In addition, semantic segmentation necessitates the effective use of pixel-to-pixel similarity in the channel direction to enhance pixel areas. By reviewing the characteristics of earlier successful segmentation models, we discover a number of crucial elements that enhance segmentation model performance, including a robust encoder structure, multi-scale interactions, attention mechanisms, and a robust decoder structure. The attention mechanism of the asymmetric non-local neural network (ANNet) is merged with multi-scale pyramidal modules to accelerate model segmentation while maintaining high accuracy. However, ANNet does not account for the similarity between pixels in the feature map channel direction, making the segmentation accuracy unsatisfactory. As a result, we propose EMSNet, a straightforward convolutional network architecture for semantic segmentation that consists of Integration of enhanced regional module (IERM) and Multi-scale convolution module (MSCM). The IERM module generates weights using four or five-stage feature maps, then fuses the input features with the weights and uses more computation. The similarity of the channel direction feature graphs is also calculated using ANNet’s auxiliary loss function. The MSCM module can more accurately describe the interactions between various channels, capture the interdependencies between feature pixels, and capture the multi-scale context. Experiments prove that we perform well in tests using the benchmark dataset. On Cityscapes test data, we get 82.2% segmentation accuracy. The mIoU in the ADE20k and Pascal VOC datasets are, respectively, 45.58% and 85.46%.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":" 26","pages":""},"PeriodicalIF":4.6000,"publicationDate":"2023-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Complex & Intelligent Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s40747-023-01279-x","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Multi-scale representation provides an effective answer to the scale variation of objects and entities in semantic segmentation. The ability to capture long-range pixel dependency facilitates semantic segmentation. In addition, semantic segmentation necessitates the effective use of pixel-to-pixel similarity in the channel direction to enhance pixel areas. By reviewing the characteristics of earlier successful segmentation models, we discover a number of crucial elements that enhance segmentation model performance, including a robust encoder structure, multi-scale interactions, attention mechanisms, and a robust decoder structure. The attention mechanism of the asymmetric non-local neural network (ANNet) is merged with multi-scale pyramidal modules to accelerate model segmentation while maintaining high accuracy. However, ANNet does not account for the similarity between pixels in the feature map channel direction, making the segmentation accuracy unsatisfactory. As a result, we propose EMSNet, a straightforward convolutional network architecture for semantic segmentation that consists of Integration of enhanced regional module (IERM) and Multi-scale convolution module (MSCM). The IERM module generates weights using four or five-stage feature maps, then fuses the input features with the weights and uses more computation. The similarity of the channel direction feature graphs is also calculated using ANNet’s auxiliary loss function. The MSCM module can more accurately describe the interactions between various channels, capture the interdependencies between feature pixels, and capture the multi-scale context. Experiments prove that we perform well in tests using the benchmark dataset. On Cityscapes test data, we get 82.2% segmentation accuracy. The mIoU in the ADE20k and Pascal VOC datasets are, respectively, 45.58% and 85.46%.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

增强的多尺度语义分割网络

多尺度表示为语义分割中对象和实体的尺度变化提供了有效的解决方案。捕获远程像素依赖性的能力有助于语义分割。此外，语义分割需要在信道方向上有效利用像素间的相似性来增强像素区域。通过回顾早期成功的分割模型的特征，我们发现了一些提高分割模型性能的关键因素，包括鲁棒编码器结构、多尺度交互、注意机制和鲁棒解码器结构。将非对称非局部神经网络(ANNet)的注意机制与多尺度锥体模块相结合，在保持较高分割精度的同时加快了模型分割速度。然而，ANNet没有考虑到特征映射通道方向上像素之间的相似性，使得分割精度不理想。因此，我们提出了EMSNet，一种用于语义分割的简单卷积网络架构，由增强区域模块(IERM)和多尺度卷积模块(MSCM)的集成组成。IERM模块使用四或五阶段特征映射生成权重，然后将输入特征与权重融合，并使用更多的计算量。利用ANNet的辅助损失函数计算信道方向特征图的相似度。MSCM模块可以更准确地描述各通道之间的相互作用，捕获特征像素之间的相互依赖关系，并捕获多尺度上下文。实验证明我们在使用基准数据集的测试中表现良好。在cityscape测试数据上，我们得到了82.2%的分割准确率。ADE20k和Pascal VOC数据集的mIoU分别为45.58%和85.46%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Complex & Intelligent Systems COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-

CiteScore

9.60

自引率

10.30%

发文量

297

期刊介绍： Complex & Intelligent Systems aims to provide a forum for presenting and discussing novel approaches, tools and techniques meant for attaining a cross-fertilization between the broad fields of complex systems, computational simulation, and intelligent analytics and visualization. The transdisciplinary research that the journal focuses on will expand the boundaries of our understanding by investigating the principles and processes that underlie many of the most profound problems facing society today.