Hui-Shi Song, Yun Zhou, Zhuqing Jiang, Xiaoqiang Guo, Zixuan Yang
{"title":"具有全局和局部图像特征的ResNet,堆栈池块,用于语义分割","authors":"Hui-Shi Song, Yun Zhou, Zhuqing Jiang, Xiaoqiang Guo, Zixuan Yang","doi":"10.1109/ICCCHINA.2018.8641146","DOIUrl":null,"url":null,"abstract":"Recently, deep convolutional neural networks (CNNs) have achieved great success in semantic segmentation systems. In this paper, we show how to improve pixel-wise semantic segmentation by combine both global context information and local image features. First, we implement a fusion layer that allows us to merge global features and local features in encoder network. Second, in decoder network, we introduce a stacked pooling block, which is able to significantly expand the receptive fields of features maps and is essential to contextualize local semantic predictions. Furthermore, our approach is based on ResNet18, which makes our model have much less parameters than current published models. The whole framework is trained in an end-to-end fashion without any post-processing. We show that our method improves the performance of semantic image segmentation on two datasets CamVid and Cityscapes, which demonstrate its effectiveness.","PeriodicalId":170216,"journal":{"name":"2018 IEEE/CIC International Conference on Communications in China (ICCC)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"ResNet with Global and Local Image Features, Stacked Pooling Block, for Semantic Segmentation\",\"authors\":\"Hui-Shi Song, Yun Zhou, Zhuqing Jiang, Xiaoqiang Guo, Zixuan Yang\",\"doi\":\"10.1109/ICCCHINA.2018.8641146\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recently, deep convolutional neural networks (CNNs) have achieved great success in semantic segmentation systems. In this paper, we show how to improve pixel-wise semantic segmentation by combine both global context information and local image features. First, we implement a fusion layer that allows us to merge global features and local features in encoder network. Second, in decoder network, we introduce a stacked pooling block, which is able to significantly expand the receptive fields of features maps and is essential to contextualize local semantic predictions. Furthermore, our approach is based on ResNet18, which makes our model have much less parameters than current published models. The whole framework is trained in an end-to-end fashion without any post-processing. We show that our method improves the performance of semantic image segmentation on two datasets CamVid and Cityscapes, which demonstrate its effectiveness.\",\"PeriodicalId\":170216,\"journal\":{\"name\":\"2018 IEEE/CIC International Conference on Communications in China (ICCC)\",\"volume\":\"15 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 IEEE/CIC International Conference on Communications in China (ICCC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCCHINA.2018.8641146\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE/CIC International Conference on Communications in China (ICCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCCHINA.2018.8641146","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
ResNet with Global and Local Image Features, Stacked Pooling Block, for Semantic Segmentation
Recently, deep convolutional neural networks (CNNs) have achieved great success in semantic segmentation systems. In this paper, we show how to improve pixel-wise semantic segmentation by combine both global context information and local image features. First, we implement a fusion layer that allows us to merge global features and local features in encoder network. Second, in decoder network, we introduce a stacked pooling block, which is able to significantly expand the receptive fields of features maps and is essential to contextualize local semantic predictions. Furthermore, our approach is based on ResNet18, which makes our model have much less parameters than current published models. The whole framework is trained in an end-to-end fashion without any post-processing. We show that our method improves the performance of semantic image segmentation on two datasets CamVid and Cityscapes, which demonstrate its effectiveness.