SRNSD: Structure-Regularized Night-Time Self-Supervised Monocular Depth Estimation for Outdoor Scenes

Runmin Cong;Chunlei Wu;Xibin Song;Wei Zhang;Sam Kwong;Hongdong Li;Pan Ji
{"title":"SRNSD: Structure-Regularized Night-Time Self-Supervised Monocular Depth Estimation for Outdoor Scenes","authors":"Runmin Cong;Chunlei Wu;Xibin Song;Wei Zhang;Sam Kwong;Hongdong Li;Pan Ji","doi":"10.1109/TIP.2024.3465034","DOIUrl":null,"url":null,"abstract":"Deep CNNs have achieved impressive improvements for night-time self-supervised depth estimation form a monocular image. However, the performance degrades considerably compared to day-time depth estimation due to significant domain gaps, low visibility, and varying illuminations between day and night images. To address these challenges, we propose a novel night-time self-supervised monocular depth estimation framework with structure regularization, i.e., SRNSD, which incorporates three aspects of constraints for better performance, including feature and depth domain adaptation, image perspective constraint, and cropped multi-scale consistency loss. Specifically, we utilize adaptations of both feature and depth output spaces for better night-time feature extraction and depth map prediction, along with high- and low-frequency decoupling operations for better depth structure and texture recovery. Meanwhile, we employ an image perspective constraint to enhance the smoothness and obtain better depth maps in areas where the luminosity jumps change. Furthermore, we introduce a simple yet effective cropped multi-scale consistency loss that utilizes consistency among different scales of depth outputs for further optimization, refining the detailed textures and structures of predicted depth. Experimental results on different benchmarks with depth ranges of 40m and 60m, including Oxford RobotCar dataset, nuScenes dataset and CARLA-EPE dataset, demonstrate the superiority of our approach over state-of-the-art night-time self-supervised depth estimation approaches across multiple metrics, proving our effectiveness.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10696933/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Deep CNNs have achieved impressive improvements for night-time self-supervised depth estimation form a monocular image. However, the performance degrades considerably compared to day-time depth estimation due to significant domain gaps, low visibility, and varying illuminations between day and night images. To address these challenges, we propose a novel night-time self-supervised monocular depth estimation framework with structure regularization, i.e., SRNSD, which incorporates three aspects of constraints for better performance, including feature and depth domain adaptation, image perspective constraint, and cropped multi-scale consistency loss. Specifically, we utilize adaptations of both feature and depth output spaces for better night-time feature extraction and depth map prediction, along with high- and low-frequency decoupling operations for better depth structure and texture recovery. Meanwhile, we employ an image perspective constraint to enhance the smoothness and obtain better depth maps in areas where the luminosity jumps change. Furthermore, we introduce a simple yet effective cropped multi-scale consistency loss that utilizes consistency among different scales of depth outputs for further optimization, refining the detailed textures and structures of predicted depth. Experimental results on different benchmarks with depth ranges of 40m and 60m, including Oxford RobotCar dataset, nuScenes dataset and CARLA-EPE dataset, demonstrate the superiority of our approach over state-of-the-art night-time self-supervised depth estimation approaches across multiple metrics, proving our effectiveness.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
SRNSD:针对室外场景的结构规则化夜间自监督单目深度估计
深度 CNN 在单目图像的夜间自监督深度估计方面取得了令人瞩目的进步。然而,与白天的深度估计相比,由于日夜图像之间存在明显的域差距、低能见度和不同的光照度,其性能大大降低。为了应对这些挑战,我们提出了一种具有结构正则化的新型夜间自监督单目深度估算框架,即 SRNSD,它结合了三个方面的约束条件以获得更好的性能,包括特征和深度域适应、图像透视约束和裁剪多尺度一致性损失。具体来说,我们利用特征和深度输出空间的适应性来实现更好的夜间特征提取和深度图预测,同时利用高频和低频解耦操作来实现更好的深度结构和纹理恢复。同时,我们采用图像透视约束来增强光滑度,并在亮度跃变区域获得更好的深度图。此外,我们还引入了一种简单而有效的裁剪多尺度一致性损失,利用不同尺度深度输出之间的一致性进行进一步优化,完善预测深度的细节纹理和结构。在深度范围为 40 米和 60 米的不同基准(包括牛津 RobotCar 数据集、nuScenes 数据集和 CARLA-EPE 数据集)上的实验结果表明,我们的方法在多个指标上都优于最先进的夜间自监督深度估计方法,证明了我们的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
MLFA: Toward Realistic Test Time Adaptive Object Detection by Multi-Level Feature Alignment Towards Real-World Super Resolution with Adaptive Self-Similarity Mining. Error Model and Concise Temporal Network for Indirect Illumination in 3D Reconstruction Multi-Scale Spatio-Temporal Memory Network for Lightweight Video Denoising A Virtual-Sensor Construction Network Based on Physical Imaging for Image Super-Resolution
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1