{"title":"MonoBooster:半密集跳转连接与跨层注意力,用于增强自我监督的单目深度估计","authors":"Changhao Wang;Guanwen Zhang;Zhengyun Cheng;Wei Zhou","doi":"10.1109/LSP.2024.3488499","DOIUrl":null,"url":null,"abstract":"Accurate depth estimation is crucial for various applications that require precise 3D information about the surrounding environment. In this paper, we propose MonoBooster, a feature aggregation architecture to enhance the performance of self-supervised monocular depth estimation. Specifically, we introduce a semi-dense skip connection scheme to aggregate multi-level features extracted from the backbone network. Additionally, we present a novel Cross-Level Attention (CLA) module to fuse the connected features. The CLA module captures spatial correlation using pyramid depth-wise convolution and adaptively extracts channel information from both low-level and high-level features, facilitating the translation from input RGB image to estimated depth map. Experimental results on the KITTI and Make3D datasets validate the effectiveness of the proposed MonoBooster. Notably, the MonoBooster architecture is flexible and can be seamlessly integrated into popular backbones, resulting in enhanced depth estimation performance.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"31 ","pages":"3069-3073"},"PeriodicalIF":3.2000,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"MonoBooster: Semi-Dense Skip Connection With Cross-Level Attention for Boosting Self-Supervised Monocular Depth Estimation\",\"authors\":\"Changhao Wang;Guanwen Zhang;Zhengyun Cheng;Wei Zhou\",\"doi\":\"10.1109/LSP.2024.3488499\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Accurate depth estimation is crucial for various applications that require precise 3D information about the surrounding environment. In this paper, we propose MonoBooster, a feature aggregation architecture to enhance the performance of self-supervised monocular depth estimation. Specifically, we introduce a semi-dense skip connection scheme to aggregate multi-level features extracted from the backbone network. Additionally, we present a novel Cross-Level Attention (CLA) module to fuse the connected features. The CLA module captures spatial correlation using pyramid depth-wise convolution and adaptively extracts channel information from both low-level and high-level features, facilitating the translation from input RGB image to estimated depth map. Experimental results on the KITTI and Make3D datasets validate the effectiveness of the proposed MonoBooster. Notably, the MonoBooster architecture is flexible and can be seamlessly integrated into popular backbones, resulting in enhanced depth estimation performance.\",\"PeriodicalId\":13154,\"journal\":{\"name\":\"IEEE Signal Processing Letters\",\"volume\":\"31 \",\"pages\":\"3069-3073\"},\"PeriodicalIF\":3.2000,\"publicationDate\":\"2024-10-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Signal Processing Letters\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10738278/\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Signal Processing Letters","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10738278/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
MonoBooster: Semi-Dense Skip Connection With Cross-Level Attention for Boosting Self-Supervised Monocular Depth Estimation
Accurate depth estimation is crucial for various applications that require precise 3D information about the surrounding environment. In this paper, we propose MonoBooster, a feature aggregation architecture to enhance the performance of self-supervised monocular depth estimation. Specifically, we introduce a semi-dense skip connection scheme to aggregate multi-level features extracted from the backbone network. Additionally, we present a novel Cross-Level Attention (CLA) module to fuse the connected features. The CLA module captures spatial correlation using pyramid depth-wise convolution and adaptively extracts channel information from both low-level and high-level features, facilitating the translation from input RGB image to estimated depth map. Experimental results on the KITTI and Make3D datasets validate the effectiveness of the proposed MonoBooster. Notably, the MonoBooster architecture is flexible and can be seamlessly integrated into popular backbones, resulting in enhanced depth estimation performance.
期刊介绍:
The IEEE Signal Processing Letters is a monthly, archival publication designed to provide rapid dissemination of original, cutting-edge ideas and timely, significant contributions in signal, image, speech, language and audio processing. Papers published in the Letters can be presented within one year of their appearance in signal processing conferences such as ICASSP, GlobalSIP and ICIP, and also in several workshop organized by the Signal Processing Society.