{"title":"Adaptive Multi-Level Saliency Network in 3D Generation","authors":"Zhongliang Tang","doi":"10.1109/ICIVC.2018.8492765","DOIUrl":null,"url":null,"abstract":"Many CNNs with encoder-decoder structure are widely used in supervised 3D voxel generation. However, their convolutional encoders are usually too simple which causes some local features to degrade during convolution, so it is difficult to extract a good feature representation from an input image by using a simple encoder. Some CNNs apply skip-connection layer for the encoder to reduce the degradation, but general skip-connection layer such as residual layer is not good enough especially when the quantity of convolutional layers in the encoder is relatively small. In this paper, we propose a novel structure called adaptive multi-level saliency network (AMSN) to reduce the degradation of local features. The major innovations of AMSN are multi-level saliency convolution kernels (MSCK) and saliency fusion layer. Different from the kernels used in general skip-connection layer, MSCK are adaptively learned from multi-level salient feature maps rather than initialized randomly. The salient feature maps are sampled from multiple layers in the encoder. MSCK can acquire multi-level features more easily so that we utilize MSCK to acquire local features from low-level layer before the degradation. After that, the acquired local features are fused back into encoder through a saliency fusion layer to reduce the degradation. We evaluated our approach on the ShapeNet and ModelNet40 dataset. The results indicate that our AMSN performs better than related works.","PeriodicalId":173981,"journal":{"name":"2018 IEEE 3rd International Conference on Image, Vision and Computing (ICIVC)","volume":"127 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE 3rd International Conference on Image, Vision and Computing (ICIVC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICIVC.2018.8492765","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Many CNNs with encoder-decoder structure are widely used in supervised 3D voxel generation. However, their convolutional encoders are usually too simple which causes some local features to degrade during convolution, so it is difficult to extract a good feature representation from an input image by using a simple encoder. Some CNNs apply skip-connection layer for the encoder to reduce the degradation, but general skip-connection layer such as residual layer is not good enough especially when the quantity of convolutional layers in the encoder is relatively small. In this paper, we propose a novel structure called adaptive multi-level saliency network (AMSN) to reduce the degradation of local features. The major innovations of AMSN are multi-level saliency convolution kernels (MSCK) and saliency fusion layer. Different from the kernels used in general skip-connection layer, MSCK are adaptively learned from multi-level salient feature maps rather than initialized randomly. The salient feature maps are sampled from multiple layers in the encoder. MSCK can acquire multi-level features more easily so that we utilize MSCK to acquire local features from low-level layer before the degradation. After that, the acquired local features are fused back into encoder through a saliency fusion layer to reduce the degradation. We evaluated our approach on the ShapeNet and ModelNet40 dataset. The results indicate that our AMSN performs better than related works.