使用 pix2pix 和 CycleGAN 进行室内环境深度图估算的比较研究

IF 1.3 4区工程技术 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS IEEE Latin America Transactions Pub Date : 2024-02-09 DOI:10.1109/TLA.2024.10431422

Ricardo Salvino Casado;Emerson Carlos Pedrino

{"title":"使用 pix2pix 和 CycleGAN 进行室内环境深度图估算的比较研究","authors":"Ricardo Salvino Casado;Emerson Carlos Pedrino","doi":"10.1109/TLA.2024.10431422","DOIUrl":null,"url":null,"abstract":"This article presents a Deep Learning-based approach for comparing automatic depth map estimation in indoor environments, with the aim of using them in navigation aid systems for visually impaired individuals. Depth map estimation is a laborious process, as most high-precision systems consist of complex stereo vision systems. The methodology utilizes Generative Adversarial Networks (GANs) techniques for generating depth maps from single RGB images. The study introduces methods for generating depth maps using pix2pix and CycleGAN. The major challenges still lie in the need to use large datasets, which are coupled with long training times. Additionally, a comparison of L1 Loss with a variation of the MonoDepth2 and DenseDepth systems was performed, using ResNet50 and ResNet18 as encoders, which are mentioned in this work, for comparison and validation of the presented method. The results demonstrate that CycleGAN is capable of generating more reliable maps compared to pix2pix and DepthNetResNet50, with an L1 Loss approximately 2,5 times smaller than pix2pix, approximately 2,4 times smaller than DepthNetResNet50, and approximately 14 times smaller than DepthNetResNet18.","PeriodicalId":55024,"journal":{"name":"IEEE Latin America Transactions","volume":"22 3","pages":"213-221"},"PeriodicalIF":1.3000,"publicationDate":"2024-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10431422","citationCount":"0","resultStr":"{\"title\":\"A Comparison Study of Depth Map Estimation in Indoor Environments Using pix2pix and CycleGAN\",\"authors\":\"Ricardo Salvino Casado;Emerson Carlos Pedrino\",\"doi\":\"10.1109/TLA.2024.10431422\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This article presents a Deep Learning-based approach for comparing automatic depth map estimation in indoor environments, with the aim of using them in navigation aid systems for visually impaired individuals. Depth map estimation is a laborious process, as most high-precision systems consist of complex stereo vision systems. The methodology utilizes Generative Adversarial Networks (GANs) techniques for generating depth maps from single RGB images. The study introduces methods for generating depth maps using pix2pix and CycleGAN. The major challenges still lie in the need to use large datasets, which are coupled with long training times. Additionally, a comparison of L1 Loss with a variation of the MonoDepth2 and DenseDepth systems was performed, using ResNet50 and ResNet18 as encoders, which are mentioned in this work, for comparison and validation of the presented method. The results demonstrate that CycleGAN is capable of generating more reliable maps compared to pix2pix and DepthNetResNet50, with an L1 Loss approximately 2,5 times smaller than pix2pix, approximately 2,4 times smaller than DepthNetResNet50, and approximately 14 times smaller than DepthNetResNet18.\",\"PeriodicalId\":55024,\"journal\":{\"name\":\"IEEE Latin America Transactions\",\"volume\":\"22 3\",\"pages\":\"213-221\"},\"PeriodicalIF\":1.3000,\"publicationDate\":\"2024-02-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10431422\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Latin America Transactions\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10431422/\",\"RegionNum\":4,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Latin America Transactions","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10431422/","RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

本文介绍了一种基于深度学习的方法，用于比较室内环境中的自动深度图估算，目的是将其用于视障人士的导航辅助系统。深度图估算是一个费力的过程，因为大多数高精度系统都由复杂的立体视觉系统组成。该方法利用生成对抗网络（GANs）技术从单个 RGB 图像生成深度图。研究介绍了使用 pix2pix 和 CycleGAN 生成深度图的方法。主要的挑战仍然在于需要使用大型数据集，而且训练时间较长。此外，还使用 ResNet50 和 ResNet18 作为编码器，对 L1 Loss 与 MonoDepth2 和 DenseDepth 系统的变体进行了比较，以比较和验证所提出的方法。结果表明，与 pix2pix 和 DepthNetResNet50 相比，CycleGAN 能够生成更可靠的地图，其 L1 损失比 pix2pix 小约 2.5 倍，比 DepthNetResNet50 小约 2.4 倍，比 DepthNetResNet18 小约 14 倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

A Comparison Study of Depth Map Estimation in Indoor Environments Using pix2pix and CycleGAN

This article presents a Deep Learning-based approach for comparing automatic depth map estimation in indoor environments, with the aim of using them in navigation aid systems for visually impaired individuals. Depth map estimation is a laborious process, as most high-precision systems consist of complex stereo vision systems. The methodology utilizes Generative Adversarial Networks (GANs) techniques for generating depth maps from single RGB images. The study introduces methods for generating depth maps using pix2pix and CycleGAN. The major challenges still lie in the need to use large datasets, which are coupled with long training times. Additionally, a comparison of L1 Loss with a variation of the MonoDepth2 and DenseDepth systems was performed, using ResNet50 and ResNet18 as encoders, which are mentioned in this work, for comparison and validation of the presented method. The results demonstrate that CycleGAN is capable of generating more reliable maps compared to pix2pix and DepthNetResNet50, with an L1 Loss approximately 2,5 times smaller than pix2pix, approximately 2,4 times smaller than DepthNetResNet50, and approximately 14 times smaller than DepthNetResNet18.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Latin America Transactions COMPUTER SCIENCE, INFORMATION SYSTEMS-ENGINEERING, ELECTRICAL & ELECTRONIC

CiteScore

3.50

自引率

7.70%

发文量

192

审稿时长

3-8 weeks

期刊介绍： IEEE Latin America Transactions (IEEE LATAM) is an interdisciplinary journal focused on the dissemination of original and quality research papers / review articles in Spanish and Portuguese of emerging topics in three main areas: Computing, Electric Energy and Electronics. Some of the sub-areas of the journal are, but not limited to: Automatic control, communications, instrumentation, artificial intelligence, power and industrial electronics, fault diagnosis and detection, transportation electrification, internet of things, electrical machines, circuits and systems, biomedicine and biomedical / haptic applications, secure communications, robotics, sensors and actuators, computer networks, smart grids, among others.