{"title":"使用 pix2pix 和 CycleGAN 进行室内环境深度图估算的比较研究","authors":"Ricardo Salvino Casado;Emerson Carlos Pedrino","doi":"10.1109/TLA.2024.10431422","DOIUrl":null,"url":null,"abstract":"This article presents a Deep Learning-based approach for comparing automatic depth map estimation in indoor environments, with the aim of using them in navigation aid systems for visually impaired individuals. Depth map estimation is a laborious process, as most high-precision systems consist of complex stereo vision systems. The methodology utilizes Generative Adversarial Networks (GANs) techniques for generating depth maps from single RGB images. The study introduces methods for generating depth maps using pix2pix and CycleGAN. The major challenges still lie in the need to use large datasets, which are coupled with long training times. Additionally, a comparison of L1 Loss with a variation of the MonoDepth2 and DenseDepth systems was performed, using ResNet50 and ResNet18 as encoders, which are mentioned in this work, for comparison and validation of the presented method. The results demonstrate that CycleGAN is capable of generating more reliable maps compared to pix2pix and DepthNetResNet50, with an L1 Loss approximately 2,5 times smaller than pix2pix, approximately 2,4 times smaller than DepthNetResNet50, and approximately 14 times smaller than DepthNetResNet18.","PeriodicalId":55024,"journal":{"name":"IEEE Latin America Transactions","volume":null,"pages":null},"PeriodicalIF":1.3000,"publicationDate":"2024-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10431422","citationCount":"0","resultStr":"{\"title\":\"A Comparison Study of Depth Map Estimation in Indoor Environments Using pix2pix and CycleGAN\",\"authors\":\"Ricardo Salvino Casado;Emerson Carlos Pedrino\",\"doi\":\"10.1109/TLA.2024.10431422\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This article presents a Deep Learning-based approach for comparing automatic depth map estimation in indoor environments, with the aim of using them in navigation aid systems for visually impaired individuals. Depth map estimation is a laborious process, as most high-precision systems consist of complex stereo vision systems. The methodology utilizes Generative Adversarial Networks (GANs) techniques for generating depth maps from single RGB images. The study introduces methods for generating depth maps using pix2pix and CycleGAN. The major challenges still lie in the need to use large datasets, which are coupled with long training times. Additionally, a comparison of L1 Loss with a variation of the MonoDepth2 and DenseDepth systems was performed, using ResNet50 and ResNet18 as encoders, which are mentioned in this work, for comparison and validation of the presented method. The results demonstrate that CycleGAN is capable of generating more reliable maps compared to pix2pix and DepthNetResNet50, with an L1 Loss approximately 2,5 times smaller than pix2pix, approximately 2,4 times smaller than DepthNetResNet50, and approximately 14 times smaller than DepthNetResNet18.\",\"PeriodicalId\":55024,\"journal\":{\"name\":\"IEEE Latin America Transactions\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":1.3000,\"publicationDate\":\"2024-02-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10431422\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Latin America Transactions\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10431422/\",\"RegionNum\":4,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Latin America Transactions","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10431422/","RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
A Comparison Study of Depth Map Estimation in Indoor Environments Using pix2pix and CycleGAN
This article presents a Deep Learning-based approach for comparing automatic depth map estimation in indoor environments, with the aim of using them in navigation aid systems for visually impaired individuals. Depth map estimation is a laborious process, as most high-precision systems consist of complex stereo vision systems. The methodology utilizes Generative Adversarial Networks (GANs) techniques for generating depth maps from single RGB images. The study introduces methods for generating depth maps using pix2pix and CycleGAN. The major challenges still lie in the need to use large datasets, which are coupled with long training times. Additionally, a comparison of L1 Loss with a variation of the MonoDepth2 and DenseDepth systems was performed, using ResNet50 and ResNet18 as encoders, which are mentioned in this work, for comparison and validation of the presented method. The results demonstrate that CycleGAN is capable of generating more reliable maps compared to pix2pix and DepthNetResNet50, with an L1 Loss approximately 2,5 times smaller than pix2pix, approximately 2,4 times smaller than DepthNetResNet50, and approximately 14 times smaller than DepthNetResNet18.
期刊介绍:
IEEE Latin America Transactions (IEEE LATAM) is an interdisciplinary journal focused on the dissemination of original and quality research papers / review articles in Spanish and Portuguese of emerging topics in three main areas: Computing, Electric Energy and Electronics. Some of the sub-areas of the journal are, but not limited to: Automatic control, communications, instrumentation, artificial intelligence, power and industrial electronics, fault diagnosis and detection, transportation electrification, internet of things, electrical machines, circuits and systems, biomedicine and biomedical / haptic applications, secure communications, robotics, sensors and actuators, computer networks, smart grids, among others.