{"title":"Design of a 3D Scene Reconstruction Network Robust to High-Frequency Areas Based on 2.5D Sketches and Encoders","authors":"Chan-Ho Lee, Jaeseok Yoo, K. Park","doi":"10.1109/ICOIN56518.2023.10048963","DOIUrl":null,"url":null,"abstract":"In this paper, we propose a new 3D scene reconstruction network that is robust to high-frequency areas by extracting multiple 3D feature volumes with accurate and various 3D information. Previous voxel representation-based methods did not perform well in high-frequency areas such as angled drawer parts and desk corners. In addition, the performance is poor even in the low-frequency areas with few feature points such as walls and floors. To solve this problem, we propose various backbone networks by extracting edge surface normal images from RGB images and constructing new branches. Edge images can provide information in the high-frequency areas, and surface normal images can compensate for the lack of information in edge images. As a result, not only 3D information but also the values of the high-frequency areas may be added. Using this as input for a new branch, various backbone networks such as ConvNeXt and Swin Transformer extract 2D image features that retain accurate 3D information. We designed a network that can represent detailed scenes from the entire scene using the hierarchical structure and unprojection of the backbone network to achieve robust performance in the high-frequency areas. We show that the proposed method outperforms the previous methods in quantitative and stereotyped 3D reconstruction results on the ScanNet dataset.","PeriodicalId":285763,"journal":{"name":"2023 International Conference on Information Networking (ICOIN)","volume":"456 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 International Conference on Information Networking (ICOIN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICOIN56518.2023.10048963","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In this paper, we propose a new 3D scene reconstruction network that is robust to high-frequency areas by extracting multiple 3D feature volumes with accurate and various 3D information. Previous voxel representation-based methods did not perform well in high-frequency areas such as angled drawer parts and desk corners. In addition, the performance is poor even in the low-frequency areas with few feature points such as walls and floors. To solve this problem, we propose various backbone networks by extracting edge surface normal images from RGB images and constructing new branches. Edge images can provide information in the high-frequency areas, and surface normal images can compensate for the lack of information in edge images. As a result, not only 3D information but also the values of the high-frequency areas may be added. Using this as input for a new branch, various backbone networks such as ConvNeXt and Swin Transformer extract 2D image features that retain accurate 3D information. We designed a network that can represent detailed scenes from the entire scene using the hierarchical structure and unprojection of the backbone network to achieve robust performance in the high-frequency areas. We show that the proposed method outperforms the previous methods in quantitative and stereotyped 3D reconstruction results on the ScanNet dataset.