{"title":"用于街景实时语义分割的门控特征聚合和配准网络","authors":"Qian Liu, Zhensheng Li, Youwei Qi, Cunbao Wang","doi":"10.1007/s00530-024-01429-2","DOIUrl":null,"url":null,"abstract":"<p>Semantic segmentation of street scenes is important for the vision-based application of autonomous driving. Recently, high-accuracy networks based on deep learning have been widely applied to semantic segmentation, but their inference speeds are slow. In order to achieve faster speed, most popular real-time network architectures adopt stepwise downsampling operation in the backbone to obtain features with different sizes. However, they ignore the misalignment between feature maps from different levels, and their simple feature aggregation using element-wise addition or channel-wise concatenation may submerge the useful information in a large number of useless information. To deal with these problems, we propose a gated feature aggregation and alignment network (GFAANet) for real-time semantic segmentation of street scenes. In GFAANet, a feature alignment aggregation module is developed to effectively align and aggregate the feature maps from different levels. And we present a gated feature aggregation module to selectively aggregate and refine effective information from multi-stage features of the backbone network using gates. Furthermore, a depthwise separable pyramid pooling module based on low-resolution feature maps is designed as a context extractor to expand the effective receptive fields and fuse multi-scale contexts. Experimental results on two challenging street scene benchmark datasets show that GFAANet achieves highest accuracy in real-time semantic segmentation of street scenes, as compared with the state-of-the-art. We conclude that our GFAANet can quickly and effectively segment street scene images, which may provide technical support for autonomous driving.</p>","PeriodicalId":3,"journal":{"name":"ACS Applied Electronic Materials","volume":null,"pages":null},"PeriodicalIF":4.3000,"publicationDate":"2024-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Gated feature aggregate and alignment network for real-time semantic segmentation of street scenes\",\"authors\":\"Qian Liu, Zhensheng Li, Youwei Qi, Cunbao Wang\",\"doi\":\"10.1007/s00530-024-01429-2\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Semantic segmentation of street scenes is important for the vision-based application of autonomous driving. Recently, high-accuracy networks based on deep learning have been widely applied to semantic segmentation, but their inference speeds are slow. In order to achieve faster speed, most popular real-time network architectures adopt stepwise downsampling operation in the backbone to obtain features with different sizes. However, they ignore the misalignment between feature maps from different levels, and their simple feature aggregation using element-wise addition or channel-wise concatenation may submerge the useful information in a large number of useless information. To deal with these problems, we propose a gated feature aggregation and alignment network (GFAANet) for real-time semantic segmentation of street scenes. In GFAANet, a feature alignment aggregation module is developed to effectively align and aggregate the feature maps from different levels. And we present a gated feature aggregation module to selectively aggregate and refine effective information from multi-stage features of the backbone network using gates. Furthermore, a depthwise separable pyramid pooling module based on low-resolution feature maps is designed as a context extractor to expand the effective receptive fields and fuse multi-scale contexts. Experimental results on two challenging street scene benchmark datasets show that GFAANet achieves highest accuracy in real-time semantic segmentation of street scenes, as compared with the state-of-the-art. We conclude that our GFAANet can quickly and effectively segment street scene images, which may provide technical support for autonomous driving.</p>\",\"PeriodicalId\":3,\"journal\":{\"name\":\"ACS Applied Electronic Materials\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":4.3000,\"publicationDate\":\"2024-07-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACS Applied Electronic Materials\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1007/s00530-024-01429-2\",\"RegionNum\":3,\"RegionCategory\":\"材料科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS Applied Electronic Materials","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s00530-024-01429-2","RegionNum":3,"RegionCategory":"材料科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
Gated feature aggregate and alignment network for real-time semantic segmentation of street scenes
Semantic segmentation of street scenes is important for the vision-based application of autonomous driving. Recently, high-accuracy networks based on deep learning have been widely applied to semantic segmentation, but their inference speeds are slow. In order to achieve faster speed, most popular real-time network architectures adopt stepwise downsampling operation in the backbone to obtain features with different sizes. However, they ignore the misalignment between feature maps from different levels, and their simple feature aggregation using element-wise addition or channel-wise concatenation may submerge the useful information in a large number of useless information. To deal with these problems, we propose a gated feature aggregation and alignment network (GFAANet) for real-time semantic segmentation of street scenes. In GFAANet, a feature alignment aggregation module is developed to effectively align and aggregate the feature maps from different levels. And we present a gated feature aggregation module to selectively aggregate and refine effective information from multi-stage features of the backbone network using gates. Furthermore, a depthwise separable pyramid pooling module based on low-resolution feature maps is designed as a context extractor to expand the effective receptive fields and fuse multi-scale contexts. Experimental results on two challenging street scene benchmark datasets show that GFAANet achieves highest accuracy in real-time semantic segmentation of street scenes, as compared with the state-of-the-art. We conclude that our GFAANet can quickly and effectively segment street scene images, which may provide technical support for autonomous driving.