TransDSSL: Transformer Based Depth Estimation via Self-Supervised Learning

IF 4.6 2区 计算机科学 Q2 ROBOTICS IEEE Robotics and Automation Letters Pub Date : 2022-08-05 DOI:10.1109/LRA.2022.3196781
Daechan Han;Jeongmin Shin;Namil Kim;Soonmin Hwang;Yukyung Choi
{"title":"TransDSSL: Transformer Based Depth Estimation via Self-Supervised Learning","authors":"Daechan Han;Jeongmin Shin;Namil Kim;Soonmin Hwang;Yukyung Choi","doi":"10.1109/LRA.2022.3196781","DOIUrl":null,"url":null,"abstract":"Recently, transformers have been widely adopted for various computer vision tasks and show promising results due to their ability to encode long-range spatial dependencies in an image effectively. However, very few studies on adopting transformers in self-supervised depth estimation have been conducted. When replacing the CNN architecture with the transformer in self-supervised learning of depth, we encounter several problems such as problematic multi-scale photometric loss function when used with transformers and, insufficient ability to capture local details. In this letter, we propose an attention-based decoder module, Pixel-Wise Skip Attention (PWSA), to enhance fine details in feature maps while keeping global context from transformers. In addition, we propose utilizing self-distillation loss with single-scale photometric loss to alleviate the instability of transformer training by using correct training signals. We demonstrate that the proposed model performs accurate predictions on large objects and thin structures that require global context and local details. Our model achieves state-of-the-art performance among the self-supervised monocular depth estimation methods on KITTI and DDAD benchmarks.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":null,"pages":null},"PeriodicalIF":4.6000,"publicationDate":"2022-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Robotics and Automation Letters","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/9851497/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ROBOTICS","Score":null,"Total":0}
引用次数: 7

Abstract

Recently, transformers have been widely adopted for various computer vision tasks and show promising results due to their ability to encode long-range spatial dependencies in an image effectively. However, very few studies on adopting transformers in self-supervised depth estimation have been conducted. When replacing the CNN architecture with the transformer in self-supervised learning of depth, we encounter several problems such as problematic multi-scale photometric loss function when used with transformers and, insufficient ability to capture local details. In this letter, we propose an attention-based decoder module, Pixel-Wise Skip Attention (PWSA), to enhance fine details in feature maps while keeping global context from transformers. In addition, we propose utilizing self-distillation loss with single-scale photometric loss to alleviate the instability of transformer training by using correct training signals. We demonstrate that the proposed model performs accurate predictions on large objects and thin structures that require global context and local details. Our model achieves state-of-the-art performance among the self-supervised monocular depth estimation methods on KITTI and DDAD benchmarks.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
TransDSSL:基于自监督学习的变压器深度估计
最近,变压器被广泛应用于各种计算机视觉任务,并且由于它们能够有效地编码图像中的远程空间依赖关系而显示出有希望的结果。然而,在自监督深度估计中采用变压器的研究很少。在深度自监督学习中,用变压器代替CNN架构时,我们遇到了一些问题,如变压器使用时的多尺度光度损失函数问题,以及捕获局部细节的能力不足。在这封信中,我们提出了一个基于注意力的解码器模块,像素级跳过注意(PWSA),以增强特征图中的细节,同时保持全局上下文不受变形器的影响。此外,我们提出利用自蒸馏损耗和单尺度光度损耗,通过正确的训练信号来减轻变压器训练的不稳定性。我们证明了所提出的模型对需要全局上下文和局部细节的大型对象和薄结构进行准确的预测。我们的模型在KITTI和DDAD基准上的自监督单目深度估计方法中达到了最先进的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
IEEE Robotics and Automation Letters
IEEE Robotics and Automation Letters Computer Science-Computer Science Applications
CiteScore
9.60
自引率
15.40%
发文量
1428
期刊介绍: The scope of this journal is to publish peer-reviewed articles that provide a timely and concise account of innovative research ideas and application results, reporting significant theoretical findings and application case studies in areas of robotics and automation.
期刊最新文献
CMGFA: A BEV Segmentation Model Based on Cross-Modal Group-Mix Attention Feature Aggregator Visual-Inertial Localization Leveraging Skylight Polarization Pattern Constraints Virtual Obstacles Regulation for Multi-Agent Path Finding Shape Visual Servoing of a Cable Suspended Between Two Drones A Benchmark Dataset for Collaborative SLAM in Service Environments
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1