Recurrent Multiscale Feature Modulation for Geometry Consistent Depth Learning

IF 18.6 IEEE transactions on pattern analysis and machine intelligence Pub Date : 2024-06-27 DOI:10.1109/TPAMI.2024.3420165

Zhongkai Zhou;Xinnan Fan;Pengfei Shi;Yuanxue Xin;Dongliang Duan;Liuqing Yang

{"title":"Recurrent Multiscale Feature Modulation for Geometry Consistent Depth Learning","authors":"Zhongkai Zhou;Xinnan Fan;Pengfei Shi;Yuanxue Xin;Dongliang Duan;Liuqing Yang","doi":"10.1109/TPAMI.2024.3420165","DOIUrl":null,"url":null,"abstract":"The U-Net-like coarse-to-fine network design is currently the dominant choice for dense prediction tasks. Although this design can often achieve competitive performance, it suffers from some inherent limitations, such as training error propagation from low to high resolution and the dependency on the deeper and heavier backbones. To design an effective network that performs better, we instead propose Recurrent Multiscale Feature Modulation (R-MSFM), a new lightweight network design for self-supervised monocular depth estimation. R-MSFM extracts per-pixel features, builds a multiscale feature modulation module, and performs recurrent depth refinement through a parameter-shared decoder at a fixed resolution. This network design enables our R-MSFM to maintain a more lightweight architecture and fundamentally avoid error propagation caused by the coarse-to-fine design. Furthermore, we introduce the mask geometry consistency loss to facilitate our R-MSFM for geometry consistent depth learning. This loss penalizes the inconsistency of the estimated depths between adjacent views within the nonoccluded and nonstationary regions. Experimental results demonstrate the superiority of our proposed R-MSFM both at model size and inference speed, and show state-of-the-art results on two datasets: KITTI and Make3D.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"46 12","pages":"9551-9566"},"PeriodicalIF":18.6000,"publicationDate":"2024-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on pattern analysis and machine intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10574331/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The U-Net-like coarse-to-fine network design is currently the dominant choice for dense prediction tasks. Although this design can often achieve competitive performance, it suffers from some inherent limitations, such as training error propagation from low to high resolution and the dependency on the deeper and heavier backbones. To design an effective network that performs better, we instead propose Recurrent Multiscale Feature Modulation (R-MSFM), a new lightweight network design for self-supervised monocular depth estimation. R-MSFM extracts per-pixel features, builds a multiscale feature modulation module, and performs recurrent depth refinement through a parameter-shared decoder at a fixed resolution. This network design enables our R-MSFM to maintain a more lightweight architecture and fundamentally avoid error propagation caused by the coarse-to-fine design. Furthermore, we introduce the mask geometry consistency loss to facilitate our R-MSFM for geometry consistent depth learning. This loss penalizes the inconsistency of the estimated depths between adjacent views within the nonoccluded and nonstationary regions. Experimental results demonstrate the superiority of our proposed R-MSFM both at model size and inference speed, and show state-of-the-art results on two datasets: KITTI and Make3D.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

针对几何一致深度学习的递归多尺度特征调制

类似 U-Net 的从粗到细网络设计是目前高密度预测任务的主流选择。虽然这种设计通常能获得有竞争力的性能，但它也存在一些固有的局限性，如训练误差从低分辨率向高分辨率传播，以及对更深更重的骨干网的依赖性。为了设计出性能更好的有效网络，我们提出了循环多尺度特征调制（Recurrent Multiscale Feature Modulation，R-MSFM），这是一种用于自我监督单目深度估计的新型轻量级网络设计。R-MSFM 可提取每个像素的特征，建立多尺度特征调制模块，并在固定分辨率下通过参数共享解码器执行递归深度细化。这种网络设计使我们的 R-MSFM 保持了更轻量级的架构，并从根本上避免了由粗到细设计造成的错误传播。此外，我们还引入了掩码几何一致性损失，以促进 R-MSFM 的几何一致性深度学习。该损失对非遮挡和非稳态区域内相邻视图之间估计深度的不一致性进行惩罚。实验结果证明了我们提出的 R-MSFM 在模型大小和推理速度方面的优越性，并在两个数据集上展示了最先进的结果：KITTI 和 Make3D。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE transactions on pattern analysis and machine intelligence

自引率

0.00%

发文量