{"title":"Reconstructing 3D Shapes as an Union of Boxes from Multi-View Images","authors":"Zihan Yang, Minglun Gong","doi":"10.1145/3609703.3609705","DOIUrl":null,"url":null,"abstract":"The task of reconstructing object shapes from input images has become increasingly important in various fields, such as computer vision, robotics, augmented reality, video games, and autonomous vehicles. While approaches for reconstructing shapes with varying levels of detail have been proposed, balancing representation accuracy and model complexity remains a challenge. To address this challenge, we propose an end-to-end approach for reconstructing object shapes from multiple images using a union of box primitives. Our approach offers a simpler and more efficient 3D representation of objects without the need for intermediate products such as voxels, resulting in faster inference times. Additionally, we introduce an auxiliary task to aid in learning how to extract and transform spatial features from images without requiring camera calibrations. Extensive experiments demonstrate that our method can produce comparable results to approaches that require 3D voxelized input while utilizing only 2D RGB images as input. Furthermore, our method significantly outperforms the aforementioned approaches in terms of inference time.","PeriodicalId":101485,"journal":{"name":"Proceedings of the 2023 5th International Conference on Pattern Recognition and Intelligent Systems","volume":"96 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2023 5th International Conference on Pattern Recognition and Intelligent Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3609703.3609705","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The task of reconstructing object shapes from input images has become increasingly important in various fields, such as computer vision, robotics, augmented reality, video games, and autonomous vehicles. While approaches for reconstructing shapes with varying levels of detail have been proposed, balancing representation accuracy and model complexity remains a challenge. To address this challenge, we propose an end-to-end approach for reconstructing object shapes from multiple images using a union of box primitives. Our approach offers a simpler and more efficient 3D representation of objects without the need for intermediate products such as voxels, resulting in faster inference times. Additionally, we introduce an auxiliary task to aid in learning how to extract and transform spatial features from images without requiring camera calibrations. Extensive experiments demonstrate that our method can produce comparable results to approaches that require 3D voxelized input while utilizing only 2D RGB images as input. Furthermore, our method significantly outperforms the aforementioned approaches in terms of inference time.