DenseASPP for Semantic Segmentation in Street Scenes

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Pub Date : 2018-06-01 DOI:10.1109/CVPR.2018.00388

Maoke Yang, Kun Yu, Chi Zhang, Zhiwei Li, Kuiyuan Yang

{"title":"DenseASPP for Semantic Segmentation in Street Scenes","authors":"Maoke Yang, Kun Yu, Chi Zhang, Zhiwei Li, Kuiyuan Yang","doi":"10.1109/CVPR.2018.00388","DOIUrl":null,"url":null,"abstract":"Semantic image segmentation is a basic street scene understanding task in autonomous driving, where each pixel in a high resolution image is categorized into a set of semantic labels. Unlike other scenarios, objects in autonomous driving scene exhibit very large scale changes, which poses great challenges for high-level feature representation in a sense that multi-scale information must be correctly encoded. To remedy this problem, atrous convolution[14]was introduced to generate features with larger receptive fields without sacrificing spatial resolution. Built upon atrous convolution, Atrous Spatial Pyramid Pooling (ASPP)[2] was proposed to concatenate multiple atrous-convolved features using different dilation rates into a final feature representation. Although ASPP is able to generate multi-scale features, we argue the feature resolution in the scale-axis is not dense enough for the autonomous driving scenario. To this end, we propose Densely connected Atrous Spatial Pyramid Pooling (DenseASPP), which connects a set of atrous convolutional layers in a dense way, such that it generates multi-scale features that not only cover a larger scale range, but also cover that scale range densely, without significantly increasing the model size. We evaluate DenseASPP on the street scene benchmark Cityscapes[4] and achieve state-of-the-art performance.","PeriodicalId":6564,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","volume":"12 1","pages":"3684-3692"},"PeriodicalIF":0.0000,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1036","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CVPR.2018.00388","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1036

Abstract

Semantic image segmentation is a basic street scene understanding task in autonomous driving, where each pixel in a high resolution image is categorized into a set of semantic labels. Unlike other scenarios, objects in autonomous driving scene exhibit very large scale changes, which poses great challenges for high-level feature representation in a sense that multi-scale information must be correctly encoded. To remedy this problem, atrous convolution[14]was introduced to generate features with larger receptive fields without sacrificing spatial resolution. Built upon atrous convolution, Atrous Spatial Pyramid Pooling (ASPP)[2] was proposed to concatenate multiple atrous-convolved features using different dilation rates into a final feature representation. Although ASPP is able to generate multi-scale features, we argue the feature resolution in the scale-axis is not dense enough for the autonomous driving scenario. To this end, we propose Densely connected Atrous Spatial Pyramid Pooling (DenseASPP), which connects a set of atrous convolutional layers in a dense way, such that it generates multi-scale features that not only cover a larger scale range, but also cover that scale range densely, without significantly increasing the model size. We evaluate DenseASPP on the street scene benchmark Cityscapes[4] and achieve state-of-the-art performance.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

街景语义分割的DenseASPP

语义图像分割是自动驾驶中基本的街景理解任务，将高分辨率图像中的每个像素划分为一组语义标签。与其他场景不同，自动驾驶场景中的物体呈现非常大的尺度变化，这对高级特征表示提出了很大的挑战，因为必须对多尺度信息进行正确编码。为了解决这个问题，引入了亚光卷积[14]，在不牺牲空间分辨率的情况下生成具有更大接受域的特征。在亚特罗斯卷积的基础上，提出了亚特罗斯空间金字塔池(ASPP)[2]，以不同的扩张率将多个亚特罗斯卷积特征连接成最终的特征表示。尽管ASPP能够生成多尺度特征，但我们认为尺度轴上的特征分辨率对于自动驾驶场景来说不够密集。为此，我们提出了密集连接的亚特鲁斯空间金字塔池(DenseASPP)，它以密集的方式连接一组亚特鲁斯卷积层，从而生成的多尺度特征不仅覆盖更大的尺度范围，而且覆盖该尺度范围的密度也不显著增加模型的大小。我们在街景基准cityscape[4]上对DenseASPP进行了评估，并获得了最先进的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition

自引率

0.00%

发文量

期刊最新文献

Multistage Adversarial Losses for Pose-Based Human Image Synthesis Document Enhancement Using Visibility Detection Demo2Vec: Reasoning Object Affordances from Online Videos Planar Shape Detection at Structural Scales Where and Why are They Looking? Jointly Inferring Human Attention and Intentions in Complex Tasks