SRNSD：针对室外场景的结构规则化夜间自监督单目深度估计

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society Pub Date : 2024-09-26 DOI:10.1109/TIP.2024.3465034

Runmin Cong;Chunlei Wu;Xibin Song;Wei Zhang;Sam Kwong;Hongdong Li;Pan Ji

{"title":"SRNSD：针对室外场景的结构规则化夜间自监督单目深度估计","authors":"Runmin Cong;Chunlei Wu;Xibin Song;Wei Zhang;Sam Kwong;Hongdong Li;Pan Ji","doi":"10.1109/TIP.2024.3465034","DOIUrl":null,"url":null,"abstract":"Deep CNNs have achieved impressive improvements for night-time self-supervised depth estimation form a monocular image. However, the performance degrades considerably compared to day-time depth estimation due to significant domain gaps, low visibility, and varying illuminations between day and night images. To address these challenges, we propose a novel night-time self-supervised monocular depth estimation framework with structure regularization, i.e., SRNSD, which incorporates three aspects of constraints for better performance, including feature and depth domain adaptation, image perspective constraint, and cropped multi-scale consistency loss. Specifically, we utilize adaptations of both feature and depth output spaces for better night-time feature extraction and depth map prediction, along with high- and low-frequency decoupling operations for better depth structure and texture recovery. Meanwhile, we employ an image perspective constraint to enhance the smoothness and obtain better depth maps in areas where the luminosity jumps change. Furthermore, we introduce a simple yet effective cropped multi-scale consistency loss that utilizes consistency among different scales of depth outputs for further optimization, refining the detailed textures and structures of predicted depth. Experimental results on different benchmarks with depth ranges of 40m and 60m, including Oxford RobotCar dataset, nuScenes dataset and CARLA-EPE dataset, demonstrate the superiority of our approach over state-of-the-art night-time self-supervised depth estimation approaches across multiple metrics, proving our effectiveness.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"33 ","pages":"5538-5550"},"PeriodicalIF":0.0000,"publicationDate":"2024-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"SRNSD: Structure-Regularized Night-Time Self-Supervised Monocular Depth Estimation for Outdoor Scenes\",\"authors\":\"Runmin Cong;Chunlei Wu;Xibin Song;Wei Zhang;Sam Kwong;Hongdong Li;Pan Ji\",\"doi\":\"10.1109/TIP.2024.3465034\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deep CNNs have achieved impressive improvements for night-time self-supervised depth estimation form a monocular image. However, the performance degrades considerably compared to day-time depth estimation due to significant domain gaps, low visibility, and varying illuminations between day and night images. To address these challenges, we propose a novel night-time self-supervised monocular depth estimation framework with structure regularization, i.e., SRNSD, which incorporates three aspects of constraints for better performance, including feature and depth domain adaptation, image perspective constraint, and cropped multi-scale consistency loss. Specifically, we utilize adaptations of both feature and depth output spaces for better night-time feature extraction and depth map prediction, along with high- and low-frequency decoupling operations for better depth structure and texture recovery. Meanwhile, we employ an image perspective constraint to enhance the smoothness and obtain better depth maps in areas where the luminosity jumps change. Furthermore, we introduce a simple yet effective cropped multi-scale consistency loss that utilizes consistency among different scales of depth outputs for further optimization, refining the detailed textures and structures of predicted depth. Experimental results on different benchmarks with depth ranges of 40m and 60m, including Oxford RobotCar dataset, nuScenes dataset and CARLA-EPE dataset, demonstrate the superiority of our approach over state-of-the-art night-time self-supervised depth estimation approaches across multiple metrics, proving our effectiveness.\",\"PeriodicalId\":94032,\"journal\":{\"name\":\"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society\",\"volume\":\"33 \",\"pages\":\"5538-5550\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10696933/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10696933/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

深度 CNN 在单目图像的夜间自监督深度估计方面取得了令人瞩目的进步。然而，与白天的深度估计相比，由于日夜图像之间存在明显的域差距、低能见度和不同的光照度，其性能大大降低。为了应对这些挑战，我们提出了一种具有结构正则化的新型夜间自监督单目深度估算框架，即 SRNSD，它结合了三个方面的约束条件以获得更好的性能，包括特征和深度域适应、图像透视约束和裁剪多尺度一致性损失。具体来说，我们利用特征和深度输出空间的适应性来实现更好的夜间特征提取和深度图预测，同时利用高频和低频解耦操作来实现更好的深度结构和纹理恢复。同时，我们采用图像透视约束来增强光滑度，并在亮度跃变区域获得更好的深度图。此外，我们还引入了一种简单而有效的裁剪多尺度一致性损失，利用不同尺度深度输出之间的一致性进行进一步优化，完善预测深度的细节纹理和结构。在深度范围为 40 米和 60 米的不同基准（包括牛津 RobotCar 数据集、nuScenes 数据集和 CARLA-EPE 数据集）上的实验结果表明，我们的方法在多个指标上都优于最先进的夜间自监督深度估计方法，证明了我们的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

SRNSD: Structure-Regularized Night-Time Self-Supervised Monocular Depth Estimation for Outdoor Scenes

Deep CNNs have achieved impressive improvements for night-time self-supervised depth estimation form a monocular image. However, the performance degrades considerably compared to day-time depth estimation due to significant domain gaps, low visibility, and varying illuminations between day and night images. To address these challenges, we propose a novel night-time self-supervised monocular depth estimation framework with structure regularization, i.e., SRNSD, which incorporates three aspects of constraints for better performance, including feature and depth domain adaptation, image perspective constraint, and cropped multi-scale consistency loss. Specifically, we utilize adaptations of both feature and depth output spaces for better night-time feature extraction and depth map prediction, along with high- and low-frequency decoupling operations for better depth structure and texture recovery. Meanwhile, we employ an image perspective constraint to enhance the smoothness and obtain better depth maps in areas where the luminosity jumps change. Furthermore, we introduce a simple yet effective cropped multi-scale consistency loss that utilizes consistency among different scales of depth outputs for further optimization, refining the detailed textures and structures of predicted depth. Experimental results on different benchmarks with depth ranges of 40m and 60m, including Oxford RobotCar dataset, nuScenes dataset and CARLA-EPE dataset, demonstrate the superiority of our approach over state-of-the-art night-time self-supervised depth estimation approaches across multiple metrics, proving our effectiveness.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society

自引率

0.00%

发文量

期刊最新文献

Learning Cross-Attention Point Transformer With Global Porous Sampling Salient Object Detection From Arbitrary Modalities GSSF: Generalized Structural Sparse Function for Deep Cross-Modal Metric Learning AnlightenDiff: Anchoring Diffusion Probabilistic Model on Low Light Image Enhancement Exploring Multi-Modal Contextual Knowledge for Open-Vocabulary Object Detection