Pub Date : 2020-12-01DOI: 10.1109/VCIP49819.2020.9301851
Yang Wu, A. Jiang, Yibin Tang, H. Kwan
In this paper, we develop a novel deep-network architecture for semantic segmentation. In contrast to previous work that widely uses dilated convolutions, we employ the original ResNet as the backbone, and a multi-scale feature fusion module (MFFM) is introduced to extract long-range contextual information and upsample feature maps. Then, a graph reasoning module (GRM) based on graph-convolutional network (GCN) is developed to aggregate semantic information. Our graph reasoning network (GRNet) extracts global contexts of input features by modeling graph reasoning in a single framework. Experimental results demonstrate that our approach provides substantial benefits over a strong baseline and achieves superior segmentation performance on two benchmark datasets.
{"title":"GRNet: Deep Convolutional Neural Networks based on Graph Reasoning for Semantic Segmentation","authors":"Yang Wu, A. Jiang, Yibin Tang, H. Kwan","doi":"10.1109/VCIP49819.2020.9301851","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301851","url":null,"abstract":"In this paper, we develop a novel deep-network architecture for semantic segmentation. In contrast to previous work that widely uses dilated convolutions, we employ the original ResNet as the backbone, and a multi-scale feature fusion module (MFFM) is introduced to extract long-range contextual information and upsample feature maps. Then, a graph reasoning module (GRM) based on graph-convolutional network (GCN) is developed to aggregate semantic information. Our graph reasoning network (GRNet) extracts global contexts of input features by modeling graph reasoning in a single framework. Experimental results demonstrate that our approach provides substantial benefits over a strong baseline and achieves superior segmentation performance on two benchmark datasets.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114771623","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-12-01DOI: 10.1109/VCIP49819.2020.9301880
Alexandre Berthet, J. Dugelay
Access to technologies like mobile phones contributes to the significant increase in the volume of digital visual data (images and videos). In addition, photo editing software is becoming increasingly powerful and easy to use. In some cases, these tools can be utilized to produce forgeries with the objective to change the semantic meaning of a photo or a video (e.g. fake news). Digital image forensics (DIF) includes two main objectives: the detection (and localization) of forgery and the identification of the origin of the acquisition (i.e. sensor identification). Since 2005, many classical methods for DIF have been designed, implemented and tested on several databases. Meantime, innovative approaches based on deep learning have emerged in other fields and have surpassed traditional techniques. In the context of DIF, deep learning methods mainly use convolutional neural networks (CNN) associated with significant preprocessing modules. This is an active domain and two possible ways to operate preprocessing have been studied: prior to the network or incorporated into it. None of the various studies on the digital image forensics provide a comprehensive overview of the preprocessing techniques used with deep learning methods. Therefore, the core objective of this article is to review the preprocessing modules associated with CNN models.
{"title":"A review of data preprocessing modules in digital image forensics methods using deep learning","authors":"Alexandre Berthet, J. Dugelay","doi":"10.1109/VCIP49819.2020.9301880","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301880","url":null,"abstract":"Access to technologies like mobile phones contributes to the significant increase in the volume of digital visual data (images and videos). In addition, photo editing software is becoming increasingly powerful and easy to use. In some cases, these tools can be utilized to produce forgeries with the objective to change the semantic meaning of a photo or a video (e.g. fake news). Digital image forensics (DIF) includes two main objectives: the detection (and localization) of forgery and the identification of the origin of the acquisition (i.e. sensor identification). Since 2005, many classical methods for DIF have been designed, implemented and tested on several databases. Meantime, innovative approaches based on deep learning have emerged in other fields and have surpassed traditional techniques. In the context of DIF, deep learning methods mainly use convolutional neural networks (CNN) associated with significant preprocessing modules. This is an active domain and two possible ways to operate preprocessing have been studied: prior to the network or incorporated into it. None of the various studies on the digital image forensics provide a comprehensive overview of the preprocessing techniques used with deep learning methods. Therefore, the core objective of this article is to review the preprocessing modules associated with CNN models.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"138 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115363011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-12-01DOI: 10.1109/VCIP49819.2020.9301788
Fengqiao Wang, Lu Liu, Cheolkon Jung
Although near infrared (NIR) images contain no color, they have abundant and clear textures. In this paper, we propose deep NIR colorization with semantic segmentation and transfer learning. NIR images are capable of capturing invisible spectrum (700-1000 nm) that is quite different from visible spectrum images. We employ convolutional layers to build relationship between single NIR images and three-channel color images, instead of mapping to Lab or YCbCr color space. Moreover, we use semantic segmentation as global prior information to refine colorization of smooth regions for objects. We use color divergence loss to further optimize NIR colorization results with good structures and edges. Since the training dataset is not enough to capture rich color information, we adopt transfer learning to get color and semantic information. Experimental results verify that the proposed method produces a natural color image from single NIR image and outperforms state-of-the-art methods in terms of peak signal-to-noise ratio (PSNR) and structural similarity (SSIM).
{"title":"Deep Near Infrared Colorization with Semantic Segmentation and Transfer Learning","authors":"Fengqiao Wang, Lu Liu, Cheolkon Jung","doi":"10.1109/VCIP49819.2020.9301788","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301788","url":null,"abstract":"Although near infrared (NIR) images contain no color, they have abundant and clear textures. In this paper, we propose deep NIR colorization with semantic segmentation and transfer learning. NIR images are capable of capturing invisible spectrum (700-1000 nm) that is quite different from visible spectrum images. We employ convolutional layers to build relationship between single NIR images and three-channel color images, instead of mapping to Lab or YCbCr color space. Moreover, we use semantic segmentation as global prior information to refine colorization of smooth regions for objects. We use color divergence loss to further optimize NIR colorization results with good structures and edges. Since the training dataset is not enough to capture rich color information, we adopt transfer learning to get color and semantic information. Experimental results verify that the proposed method produces a natural color image from single NIR image and outperforms state-of-the-art methods in terms of peak signal-to-noise ratio (PSNR) and structural similarity (SSIM).","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114570793","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Indoor navigation is urgently needed by blind people in their everyday lives. In this paper, we design an assistive cane with visual odometry based on actual requirements of the blind to aid them in attaining safe indoor navigation. Compared to the state-of-the-art indoor navigation systems, the proposed device is portable, compact, and adaptable. The main specifications of the system are: the perception range is respectively from 0.10m to 2.10m, and 0.08m to 1.60m for width and length dimensions; the maximum weight is 2.1kg; the detection range is from 0.15m and 3.00m; the cruising ability is about 8h; and the objects whose heights are below 80cm can be detected. The demo video of the proposed navigation system is available at: https://doi.org/10.6084/m9.figshare.12399572.v1.
盲人在日常生活中迫切需要室内导航。本文根据盲人的实际需求,设计了一种具有视觉里程计的辅助手杖,以帮助他们实现安全的室内导航。与最先进的室内导航系统相比,所提出的设备具有便携、紧凑和适应性强的特点。系统的主要规格为:感知范围分别为0.1 m ~ 2.1 m,宽度和长度尺寸分别为0.08m ~ 1.60m;最大重量2.1kg;探测距离为0.15m ~ 3.00m;巡航能力约8h;并且可以检测到高度在80cm以下的物体。所提出的导航系统的演示视频可在:https://doi.org/10.6084/m9.figshare.12399572.v1。
{"title":"Special Cane with Visual Odometry for Real-time Indoor Navigation of Blind People","authors":"Tang Tang, Menghan Hu, Guodong Li, Qingli Li, Jian Zhang, Xiaofeng Zhou, Guangtao Zhai","doi":"10.1109/VCIP49819.2020.9301782","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301782","url":null,"abstract":"Indoor navigation is urgently needed by blind people in their everyday lives. In this paper, we design an assistive cane with visual odometry based on actual requirements of the blind to aid them in attaining safe indoor navigation. Compared to the state-of-the-art indoor navigation systems, the proposed device is portable, compact, and adaptable. The main specifications of the system are: the perception range is respectively from 0.10m to 2.10m, and 0.08m to 1.60m for width and length dimensions; the maximum weight is 2.1kg; the detection range is from 0.15m and 3.00m; the cruising ability is about 8h; and the objects whose heights are below 80cm can be detected. The demo video of the proposed navigation system is available at: https://doi.org/10.6084/m9.figshare.12399572.v1.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116193491","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-12-01DOI: 10.1109/VCIP49819.2020.9301853
Fei Han, Jin Wang, Ruiqin Xiong, Qing Zhu, Baocai Yin
As one of the next-generation multimedia technology, high dynamic range (HDR) imaging technology has been widely applied. Due to its wider color range, HDR image brings greater compression and storage burden compared with traditional LDR image. To solve this problem, in this paper, a two-layer HDR image compression framework based on convolutional neural networks is proposed. The framework is composed of a base layer which provides backward compatibility with the standard JPEG, and an extension layer based on a convolutional variational autoencoder neural networks and a post-processing module. The autoencoder mainly includes a nonlinear transform encoder, a binarized quantizer and a nonlinear transform decoder. Compared with traditional codecs, the proposed CNN autoencoder is more flexible and can retain more image semantic information, which will improve the quality of decoded HDR image. Moreover, to reduce the compression artifacts and noise of reconstructed HDR image, a post-processing method based on group convolutional neural networks is designed. Experimental results show that our method outperforms JPEG XT profile A, B, C and other methods in terms of HDR-VDP-2 evaluation metric. Meanwhile, our scheme also provides backward compatibility with the standard JPEG.
{"title":"HDR Image Compression with Convolutional Autoencoder","authors":"Fei Han, Jin Wang, Ruiqin Xiong, Qing Zhu, Baocai Yin","doi":"10.1109/VCIP49819.2020.9301853","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301853","url":null,"abstract":"As one of the next-generation multimedia technology, high dynamic range (HDR) imaging technology has been widely applied. Due to its wider color range, HDR image brings greater compression and storage burden compared with traditional LDR image. To solve this problem, in this paper, a two-layer HDR image compression framework based on convolutional neural networks is proposed. The framework is composed of a base layer which provides backward compatibility with the standard JPEG, and an extension layer based on a convolutional variational autoencoder neural networks and a post-processing module. The autoencoder mainly includes a nonlinear transform encoder, a binarized quantizer and a nonlinear transform decoder. Compared with traditional codecs, the proposed CNN autoencoder is more flexible and can retain more image semantic information, which will improve the quality of decoded HDR image. Moreover, to reduce the compression artifacts and noise of reconstructed HDR image, a post-processing method based on group convolutional neural networks is designed. Experimental results show that our method outperforms JPEG XT profile A, B, C and other methods in terms of HDR-VDP-2 evaluation metric. Meanwhile, our scheme also provides backward compatibility with the standard JPEG.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116808239","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-12-01DOI: 10.1109/VCIP49819.2020.9301830
Nelson Chong Ngee Bow, Vu-Hoang Tran, Punchok Kerdsiri, Y. P. Loh, Ching-Chun Huang
Though learning-based low-light enhancement methods have achieved significant success, existing methods are still sensitive to noise and unnatural appearance. The problems may come from the lack of structural awareness and the confusion between noise and texture. Thus, we present a lowlight image enhancement method that consists of an image disentanglement network and an illumination boosting network. The disentanglement network is first used to decompose the input image into image details and image illumination. The extracted illumination part then goes through a multi-branch enhancement network designed to improve the dynamic range of the image. The multi-branch network extracts multi-level image features and enhances them via numerous subnets. These enhanced features are then fused to generate the enhanced illumination part. Finally, the denoised image details and the enhanced illumination are entangled to produce the normallight image. Experimental results show that our method can produce visually pleasing images in many public datasets.
{"title":"DEN: Disentanglement and Enhancement Networks for Low Illumination Images","authors":"Nelson Chong Ngee Bow, Vu-Hoang Tran, Punchok Kerdsiri, Y. P. Loh, Ching-Chun Huang","doi":"10.1109/VCIP49819.2020.9301830","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301830","url":null,"abstract":"Though learning-based low-light enhancement methods have achieved significant success, existing methods are still sensitive to noise and unnatural appearance. The problems may come from the lack of structural awareness and the confusion between noise and texture. Thus, we present a lowlight image enhancement method that consists of an image disentanglement network and an illumination boosting network. The disentanglement network is first used to decompose the input image into image details and image illumination. The extracted illumination part then goes through a multi-branch enhancement network designed to improve the dynamic range of the image. The multi-branch network extracts multi-level image features and enhances them via numerous subnets. These enhanced features are then fused to generate the enhanced illumination part. Finally, the denoised image details and the enhanced illumination are entangled to produce the normallight image. Experimental results show that our method can produce visually pleasing images in many public datasets.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126428146","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We compare the performance of Support Vector Machine, XGBoost, LightGBM, k-Nearest Neighbors, Random forests and Extra-Trees on the photometric redshift estimation of quasars based on the SDSS_WISE sample. For this sample, LightGBM shows its superiority in speed while k-Nearest Neighbors, Random forests and Extra-Trees show better performance. Then k-Nearest Neighbors, Random forests and Extra-Trees are applied on the SDSS, SDSS_WISE, SDSS_UKIDSS, WISE_UKIDSS and SDSS_WISE_UKIDSS samples. The results show that the performance of an algorithm depends on the sample selection, sample size, input pattern and information from different bands; for the same sample, the more information the better performance is obtained, but different algorithms shows different accuracy; no single algorithm shows its superiority on every sample.
我们比较了基于SDSS_WISE样本的支持向量机、XGBoost、LightGBM、k近邻、随机森林和Extra-Trees在类星体光度红移估计上的性能。对于这个样本,LightGBM在速度上表现出优势,而k-Nearest Neighbors, Random forests和Extra-Trees表现出更好的性能。然后在SDSS、SDSS_WISE、SDSS_UKIDSS、WISE_UKIDSS和SDSS_WISE_UKIDSS样本上应用k近邻、随机森林和Extra-Trees。结果表明,算法的性能取决于样本选择、样本大小、输入模式和不同波段的信息;对于同一样本,信息越多,性能越好,但不同算法的准确率不同;没有一种算法在所有样本上都表现出优越性。
{"title":"Machine Learning for Photometric Redshift Estimation of Quasars with Different Samples","authors":"Yanxia Zhang, Xin Jin, Jingyi Zhang, Yongheng Zhao","doi":"10.1109/VCIP49819.2020.9301849","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301849","url":null,"abstract":"We compare the performance of Support Vector Machine, XGBoost, LightGBM, k-Nearest Neighbors, Random forests and Extra-Trees on the photometric redshift estimation of quasars based on the SDSS_WISE sample. For this sample, LightGBM shows its superiority in speed while k-Nearest Neighbors, Random forests and Extra-Trees show better performance. Then k-Nearest Neighbors, Random forests and Extra-Trees are applied on the SDSS, SDSS_WISE, SDSS_UKIDSS, WISE_UKIDSS and SDSS_WISE_UKIDSS samples. The results show that the performance of an algorithm depends on the sample selection, sample size, input pattern and information from different bands; for the same sample, the more information the better performance is obtained, but different algorithms shows different accuracy; no single algorithm shows its superiority on every sample.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126570149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-12-01DOI: 10.1109/VCIP49819.2020.9301774
Hui-jun Tang, R. T. Hsung, W. Y. Lam, Leo Y. Y. Cheng, E. Pow
3D digital smile design (DSD) gains great interest in dentistry because it enables esthetic design of teeth and gum. However, the color texture of teeth and gum is often lost/distorted in the digitization process. Recently, the image-to-geometry registration shade mapping (IGRSM) method was proposed for registering color texture from 2D photography to 3D mesh model. It allows better control of illumination and color calibration for automatic teeth shade matching. In this paper, we investigate automated techniques to find the correspondences between 3D tooth model and color intraoral photographs for accurately perform the IGRSM. We propose to use the tooth cusp tips as the correspondence points for the IGR because they can be reliably detected both in 2D photography and 3D surface scan. A modified gradient descent method with directional priority (GDDP) and region growing are developed to find the 3D correspondence points. For the 2D image, the tooth tips contour lines are extracted based on luminosity and chromaticity, the contour peaks are then detected as the correspondence points. From the experimental results, the proposed method shows excellent accuracy in detecting the correspondence points between 2D photography and 3D tooth model. The average registration error is less than 15 pixels for 4752×3168 size intraoral image.
{"title":"On 2D-3D Image Feature Detections for Image-To-Geometry Registration in Virtual Dental Model","authors":"Hui-jun Tang, R. T. Hsung, W. Y. Lam, Leo Y. Y. Cheng, E. Pow","doi":"10.1109/VCIP49819.2020.9301774","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301774","url":null,"abstract":"3D digital smile design (DSD) gains great interest in dentistry because it enables esthetic design of teeth and gum. However, the color texture of teeth and gum is often lost/distorted in the digitization process. Recently, the image-to-geometry registration shade mapping (IGRSM) method was proposed for registering color texture from 2D photography to 3D mesh model. It allows better control of illumination and color calibration for automatic teeth shade matching. In this paper, we investigate automated techniques to find the correspondences between 3D tooth model and color intraoral photographs for accurately perform the IGRSM. We propose to use the tooth cusp tips as the correspondence points for the IGR because they can be reliably detected both in 2D photography and 3D surface scan. A modified gradient descent method with directional priority (GDDP) and region growing are developed to find the 3D correspondence points. For the 2D image, the tooth tips contour lines are extracted based on luminosity and chromaticity, the contour peaks are then detected as the correspondence points. From the experimental results, the proposed method shows excellent accuracy in detecting the correspondence points between 2D photography and 3D tooth model. The average registration error is less than 15 pixels for 4752×3168 size intraoral image.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121720311","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-12-01DOI: 10.1109/VCIP49819.2020.9301813
Bo Peng, Zengrui Yu, Jianjun Lei, Jiahui Song
With the dramatic growth of 3D shape data, 3D shape recognition has become a hot research topic in the field of computer vision. How to effectively utilize the multimodal characteristics of 3D shape has been one of the key problems to boost the performance of 3D shape recognition. In this paper, we propose a novel attention-guided fusion network of point cloud and multiple views for 3D shape recognition. Specifically, in order to obtain more discriminative descriptor for 3D shape data, the inter-modality attention enhancement module and view-context attention fusion module are proposed to gradually refine and fuse the features of the point cloud and multiple views. In the inter-modality attention enhancement module, the inter-modality attention mask based on the joint feature representation is computed, so that the features of each modality are enhanced by fusing the correlative information between two modalities. After that, the view-context attention fusion module is proposed to explore the context information of multiple views, and fuse the enhanced features to obtain more discriminative descriptor for 3D shape data. Experimental results on the ModelNet40 dataset demonstrate that the proposed method achieves promising performance compared with state-of-the-art methods.
{"title":"Attention-Guided Fusion Network of Point Cloud and Multiple Views for 3D Shape Recognition","authors":"Bo Peng, Zengrui Yu, Jianjun Lei, Jiahui Song","doi":"10.1109/VCIP49819.2020.9301813","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301813","url":null,"abstract":"With the dramatic growth of 3D shape data, 3D shape recognition has become a hot research topic in the field of computer vision. How to effectively utilize the multimodal characteristics of 3D shape has been one of the key problems to boost the performance of 3D shape recognition. In this paper, we propose a novel attention-guided fusion network of point cloud and multiple views for 3D shape recognition. Specifically, in order to obtain more discriminative descriptor for 3D shape data, the inter-modality attention enhancement module and view-context attention fusion module are proposed to gradually refine and fuse the features of the point cloud and multiple views. In the inter-modality attention enhancement module, the inter-modality attention mask based on the joint feature representation is computed, so that the features of each modality are enhanced by fusing the correlative information between two modalities. After that, the view-context attention fusion module is proposed to explore the context information of multiple views, and fuse the enhanced features to obtain more discriminative descriptor for 3D shape data. Experimental results on the ModelNet40 dataset demonstrate that the proposed method achieves promising performance compared with state-of-the-art methods.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"60 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121262717","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-12-01DOI: 10.1109/VCIP49819.2020.9301759
Xingtao Wang, Xiaopeng Fan, Debin Zhao
The airborne laser scanning (ALS) point cloud has drawn increasing attention thanks to its capability to quickly acquire large-scale and high-precision ground information. Due to the complexity of observed scenes and the irregularity of point distribution, the semantic labeling of ALS point clouds is extremely challenging. In this paper, we introduce an efficient discretization based framework according to the geometric character of ALS point clouds, and propose an original intraclass weighted cross entropy loss function to solve the problem of data imbalance. We evaluate our framework on the ISPRS (International Society for Photogrammetry and Remote Sensing) 3D Semantic Labeling dataset. The experimental results show that the proposed method has achieved a new state-of-the-art performance in terms of overall accuracy (85.3%) and average F1 score (74.1%).
{"title":"A semantic labeling framework for ALS point clouds based on discretization and CNN","authors":"Xingtao Wang, Xiaopeng Fan, Debin Zhao","doi":"10.1109/VCIP49819.2020.9301759","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301759","url":null,"abstract":"The airborne laser scanning (ALS) point cloud has drawn increasing attention thanks to its capability to quickly acquire large-scale and high-precision ground information. Due to the complexity of observed scenes and the irregularity of point distribution, the semantic labeling of ALS point clouds is extremely challenging. In this paper, we introduce an efficient discretization based framework according to the geometric character of ALS point clouds, and propose an original intraclass weighted cross entropy loss function to solve the problem of data imbalance. We evaluate our framework on the ISPRS (International Society for Photogrammetry and Remote Sensing) 3D Semantic Labeling dataset. The experimental results show that the proposed method has achieved a new state-of-the-art performance in terms of overall accuracy (85.3%) and average F1 score (74.1%).","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127038386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}