LiteMSNet: a lightweight semantic segmentation network with multi-scale feature extraction for urban streetscape scenes

The Visual Computer Pub Date : 2024-07-22 DOI:10.1007/s00371-024-03569-y

Lirong Li, Jiang Ding, Hao Cui, Zhiqiang Chen, Guisheng Liao

{"title":"LiteMSNet: a lightweight semantic segmentation network with multi-scale feature extraction for urban streetscape scenes","authors":"Lirong Li, Jiang Ding, Hao Cui, Zhiqiang Chen, Guisheng Liao","doi":"10.1007/s00371-024-03569-y","DOIUrl":null,"url":null,"abstract":"<p>Semantic segmentation plays a pivotal role in computer scene understanding, but it typically requires a large amount of computing to achieve high performance. To achieve a balance between accuracy and complexity, we propose a lightweight semantic segmentation model, termed LiteMSNet (a Lightweight Semantic Segmentation Network with Multi-Scale Feature Extraction for urban streetscape scenes). In this model, we propose a novel Improved Feature Pyramid Network, which embeds a shuffle attention mechanism followed by a stacked Depth-wise Asymmetric Gating Module. Furthermore, a Multi-scale Dilation Pyramid Module is developed to expand the receptive field and capture multi-scale feature information. Finally, the proposed lightweight model integrates two loss mechanisms, the Cross-Entropy and the Dice Loss functions, which effectively mitigate the issue of data imbalance and gradient saturation. Numerical experimental results on the CamVid dataset demonstrate a remarkable mIoU measurement of 70.85% with less than 5M parameters, accompanied by a real-time inference speed of 66.1 FPS, surpassing the existing methods documented in the literature. The code for this work will be made available at https://github.com/River-ding/LiteMSNet.</p>","PeriodicalId":501186,"journal":{"name":"The Visual Computer","volume":"39 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Visual Computer","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s00371-024-03569-y","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Semantic segmentation plays a pivotal role in computer scene understanding, but it typically requires a large amount of computing to achieve high performance. To achieve a balance between accuracy and complexity, we propose a lightweight semantic segmentation model, termed LiteMSNet (a Lightweight Semantic Segmentation Network with Multi-Scale Feature Extraction for urban streetscape scenes). In this model, we propose a novel Improved Feature Pyramid Network, which embeds a shuffle attention mechanism followed by a stacked Depth-wise Asymmetric Gating Module. Furthermore, a Multi-scale Dilation Pyramid Module is developed to expand the receptive field and capture multi-scale feature information. Finally, the proposed lightweight model integrates two loss mechanisms, the Cross-Entropy and the Dice Loss functions, which effectively mitigate the issue of data imbalance and gradient saturation. Numerical experimental results on the CamVid dataset demonstrate a remarkable mIoU measurement of 70.85% with less than 5M parameters, accompanied by a real-time inference speed of 66.1 FPS, surpassing the existing methods documented in the literature. The code for this work will be made available at https://github.com/River-ding/LiteMSNet.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

LiteMSNet：针对城市街景场景的多尺度特征提取轻量级语义分割网络

语义分割在计算机场景理解中起着举足轻重的作用，但通常需要大量计算才能实现高性能。为了在准确性和复杂性之间取得平衡，我们提出了一种轻量级语义分割模型，称为 LiteMSNet（针对城市街景场景的多尺度特征提取轻量级语义分割网络）。在这一模型中，我们提出了一种新颖的改进型特征金字塔网络，其中嵌入了一种洗牌关注机制，然后是一个堆叠的深度非对称门控模块。此外，我们还开发了多尺度扩张金字塔模块，以扩大感受野和捕捉多尺度特征信息。最后，所提出的轻量级模型集成了两种损失机制，即交叉熵和骰子损失函数，从而有效地缓解了数据不平衡和梯度饱和的问题。在 CamVid 数据集上的数值实验结果表明，在参数小于 500 万的情况下，mIoU 测量值达到了 70.85%，同时实时推理速度达到了 66.1 FPS，超过了文献中记载的现有方法。这项工作的代码将公布在 https://github.com/River-ding/LiteMSNet 网站上。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

The Visual Computer

自引率

0.00%

发文量