Hui-Shi Song, Yun Zhou, Zhuqing Jiang, Xiaoqiang Guo, Zixuan Yang
{"title":"ResNet with Global and Local Image Features, Stacked Pooling Block, for Semantic Segmentation","authors":"Hui-Shi Song, Yun Zhou, Zhuqing Jiang, Xiaoqiang Guo, Zixuan Yang","doi":"10.1109/ICCCHINA.2018.8641146","DOIUrl":null,"url":null,"abstract":"Recently, deep convolutional neural networks (CNNs) have achieved great success in semantic segmentation systems. In this paper, we show how to improve pixel-wise semantic segmentation by combine both global context information and local image features. First, we implement a fusion layer that allows us to merge global features and local features in encoder network. Second, in decoder network, we introduce a stacked pooling block, which is able to significantly expand the receptive fields of features maps and is essential to contextualize local semantic predictions. Furthermore, our approach is based on ResNet18, which makes our model have much less parameters than current published models. The whole framework is trained in an end-to-end fashion without any post-processing. We show that our method improves the performance of semantic image segmentation on two datasets CamVid and Cityscapes, which demonstrate its effectiveness.","PeriodicalId":170216,"journal":{"name":"2018 IEEE/CIC International Conference on Communications in China (ICCC)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE/CIC International Conference on Communications in China (ICCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCCHINA.2018.8641146","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Recently, deep convolutional neural networks (CNNs) have achieved great success in semantic segmentation systems. In this paper, we show how to improve pixel-wise semantic segmentation by combine both global context information and local image features. First, we implement a fusion layer that allows us to merge global features and local features in encoder network. Second, in decoder network, we introduce a stacked pooling block, which is able to significantly expand the receptive fields of features maps and is essential to contextualize local semantic predictions. Furthermore, our approach is based on ResNet18, which makes our model have much less parameters than current published models. The whole framework is trained in an end-to-end fashion without any post-processing. We show that our method improves the performance of semantic image segmentation on two datasets CamVid and Cityscapes, which demonstrate its effectiveness.