{"title":"A High-Performance Learning-Based Framework for Monocular 3-D Point Cloud Reconstruction","authors":"AmirHossein Zamani;Kamran Ghaffari;Amir G. Aghdam","doi":"10.1109/JRFID.2024.3435875","DOIUrl":null,"url":null,"abstract":"An essential yet challenging step in the 3D reconstruction problem is to train a machine or a robot to model 3D objects. Many 3D reconstruction applications depend on real-time data processing, so computational efficiency is a fundamental requirement in such systems. Despite considerable progress in 3D reconstruction techniques in recent years, developing efficient algorithms for real-time implementation remains an open problem. The present study addresses current issues in the high-precision reconstruction of objects displayed in a single-view image with sufficiently high accuracy and computational efficiency. To this end, we propose two neural frameworks: a CNN-based autoencoder architecture called Fast-Image2Point (FI2P) and a transformer-based network called TransCNN3D. These frameworks consist of two stages: perception and construction. The perception stage addresses the understanding and extraction process of the underlying contexts and features of the image. The construction stage, on the other hand, is responsible for recovering the 3D geometry of an object by using the knowledge and contexts extracted in the perception stage. The FI2P is a simple yet powerful architecture to reconstruct 3D objects from images faster (in real-time) without losing accuracy. Then, the TransCNN3D framework provides a more accurate 3D reconstruction without losing computational efficiency. The output of the reconstruction framework is represented in the point cloud format. The ShapeNet dataset is utilized to compare the proposed method with the existing ones in terms of computation time and accuracy. Simulations demonstrate the superior performance of the proposed strategy. Our dataset and code are available on IEEE DataPort website and first author’s GitHub repository respectively.","PeriodicalId":73291,"journal":{"name":"IEEE journal of radio frequency identification","volume":null,"pages":null},"PeriodicalIF":2.3000,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE journal of radio frequency identification","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10614399/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
An essential yet challenging step in the 3D reconstruction problem is to train a machine or a robot to model 3D objects. Many 3D reconstruction applications depend on real-time data processing, so computational efficiency is a fundamental requirement in such systems. Despite considerable progress in 3D reconstruction techniques in recent years, developing efficient algorithms for real-time implementation remains an open problem. The present study addresses current issues in the high-precision reconstruction of objects displayed in a single-view image with sufficiently high accuracy and computational efficiency. To this end, we propose two neural frameworks: a CNN-based autoencoder architecture called Fast-Image2Point (FI2P) and a transformer-based network called TransCNN3D. These frameworks consist of two stages: perception and construction. The perception stage addresses the understanding and extraction process of the underlying contexts and features of the image. The construction stage, on the other hand, is responsible for recovering the 3D geometry of an object by using the knowledge and contexts extracted in the perception stage. The FI2P is a simple yet powerful architecture to reconstruct 3D objects from images faster (in real-time) without losing accuracy. Then, the TransCNN3D framework provides a more accurate 3D reconstruction without losing computational efficiency. The output of the reconstruction framework is represented in the point cloud format. The ShapeNet dataset is utilized to compare the proposed method with the existing ones in terms of computation time and accuracy. Simulations demonstrate the superior performance of the proposed strategy. Our dataset and code are available on IEEE DataPort website and first author’s GitHub repository respectively.