多特征融合增强型单目深度估计与边界感知

The Visual Computer Pub Date : 2024-06-22 DOI:10.1007/s00371-024-03498-w

Chao Song, Qingjie Chen, Frederick W. B. Li, Zhaoyi Jiang, Dong Zheng, Yuliang Shen, Bailin Yang

{"title":"多特征融合增强型单目深度估计与边界感知","authors":"Chao Song, Qingjie Chen, Frederick W. B. Li, Zhaoyi Jiang, Dong Zheng, Yuliang Shen, Bailin Yang","doi":"10.1007/s00371-024-03498-w","DOIUrl":null,"url":null,"abstract":"<p>Self-supervised monocular depth estimation has opened up exciting possibilities for practical applications, including scene understanding, object detection, and autonomous driving, without the need for expensive depth annotations. However, traditional methods for single-image depth estimation encounter limitations in photometric loss due to a lack of geometric constraints, reliance on pixel-level intensity or color differences, and the assumption of perfect photometric consistency, leading to errors in challenging conditions and resulting in overly smooth depth maps with insufficient capture of object boundaries and depth transitions. To tackle these challenges, we propose MFFENet, which leverages multi-level semantic and boundary-aware features to improve depth estimation accuracy. MFFENet extracts multi-level semantic features using our modified HRFormer approach. These features are then fed into our decoder and enhanced using attention mechanisms to enrich the boundary information generated by Laplacian pyramid residuals. To mitigate the weakening of semantic features during convolution processes, we introduce a feature-enhanced combination strategy. We also integrate the DeconvUp module to improve the restoration of depth map boundaries. We introduce a boundary loss that enforces constraints between object boundaries. We propose an extended evaluation method that utilizes Laplacian pyramid residuals to evaluate boundary depth. Extensive evaluations on the KITTI, Cityscapes, and Make3D datasets demonstrate the superior performance of MFFENet compared to state-of-the-art models in monocular depth estimation.\n</p>","PeriodicalId":501186,"journal":{"name":"The Visual Computer","volume":"26 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multi-feature fusion enhanced monocular depth estimation with boundary awareness\",\"authors\":\"Chao Song, Qingjie Chen, Frederick W. B. Li, Zhaoyi Jiang, Dong Zheng, Yuliang Shen, Bailin Yang\",\"doi\":\"10.1007/s00371-024-03498-w\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Self-supervised monocular depth estimation has opened up exciting possibilities for practical applications, including scene understanding, object detection, and autonomous driving, without the need for expensive depth annotations. However, traditional methods for single-image depth estimation encounter limitations in photometric loss due to a lack of geometric constraints, reliance on pixel-level intensity or color differences, and the assumption of perfect photometric consistency, leading to errors in challenging conditions and resulting in overly smooth depth maps with insufficient capture of object boundaries and depth transitions. To tackle these challenges, we propose MFFENet, which leverages multi-level semantic and boundary-aware features to improve depth estimation accuracy. MFFENet extracts multi-level semantic features using our modified HRFormer approach. These features are then fed into our decoder and enhanced using attention mechanisms to enrich the boundary information generated by Laplacian pyramid residuals. To mitigate the weakening of semantic features during convolution processes, we introduce a feature-enhanced combination strategy. We also integrate the DeconvUp module to improve the restoration of depth map boundaries. We introduce a boundary loss that enforces constraints between object boundaries. We propose an extended evaluation method that utilizes Laplacian pyramid residuals to evaluate boundary depth. Extensive evaluations on the KITTI, Cityscapes, and Make3D datasets demonstrate the superior performance of MFFENet compared to state-of-the-art models in monocular depth estimation.\\n</p>\",\"PeriodicalId\":501186,\"journal\":{\"name\":\"The Visual Computer\",\"volume\":\"26 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-06-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"The Visual Computer\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1007/s00371-024-03498-w\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Visual Computer","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s00371-024-03498-w","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

自监督单目深度估计为实际应用提供了令人兴奋的可能性，包括场景理解、物体检测和自动驾驶，而无需昂贵的深度注释。然而，传统的单图像深度估算方法由于缺乏几何约束、依赖像素级强度或颜色差异以及假设完美的光度一致性，在光度损失方面存在局限性，从而导致在挑战性条件下出现错误，并导致深度图过于平滑，无法充分捕捉物体边界和深度转换。为了应对这些挑战，我们提出了 MFFENet，它利用多层次语义和边界感知特征来提高深度估计的准确性。MFFENet 利用我们改进的 HRFormer 方法提取多层次语义特征。然后将这些特征输入我们的解码器，并利用注意力机制进行增强，以丰富拉普拉斯金字塔残差生成的边界信息。为了减轻卷积过程中语义特征的弱化，我们引入了特征增强组合策略。我们还整合了 DeconvUp 模块，以改进深度图边界的还原。我们引入了一种边界损失，以加强物体边界之间的约束。我们提出了一种扩展的评估方法，利用拉普拉斯金字塔残差来评估边界深度。在 KITTI、Cityscapes 和 Make3D 数据集上进行的广泛评估表明，与最先进的单目深度估算模型相比，MFFENet 的性能更为出色。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

摘要图片

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Multi-feature fusion enhanced monocular depth estimation with boundary awareness

Self-supervised monocular depth estimation has opened up exciting possibilities for practical applications, including scene understanding, object detection, and autonomous driving, without the need for expensive depth annotations. However, traditional methods for single-image depth estimation encounter limitations in photometric loss due to a lack of geometric constraints, reliance on pixel-level intensity or color differences, and the assumption of perfect photometric consistency, leading to errors in challenging conditions and resulting in overly smooth depth maps with insufficient capture of object boundaries and depth transitions. To tackle these challenges, we propose MFFENet, which leverages multi-level semantic and boundary-aware features to improve depth estimation accuracy. MFFENet extracts multi-level semantic features using our modified HRFormer approach. These features are then fed into our decoder and enhanced using attention mechanisms to enrich the boundary information generated by Laplacian pyramid residuals. To mitigate the weakening of semantic features during convolution processes, we introduce a feature-enhanced combination strategy. We also integrate the DeconvUp module to improve the restoration of depth map boundaries. We introduce a boundary loss that enforces constraints between object boundaries. We propose an extended evaluation method that utilizes Laplacian pyramid residuals to evaluate boundary depth. Extensive evaluations on the KITTI, Cityscapes, and Make3D datasets demonstrate the superior performance of MFFENet compared to state-of-the-art models in monocular depth estimation.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

The Visual Computer

自引率

0.00%

发文量