TransDSSL: Transformer Based Depth Estimation via Self-Supervised Learning

IF 4.6 2区计算机科学 Q2 ROBOTICS IEEE Robotics and Automation Letters Pub Date : 2022-08-05 DOI:10.1109/LRA.2022.3196781

Daechan Han;Jeongmin Shin;Namil Kim;Soonmin Hwang;Yukyung Choi

{"title":"TransDSSL: Transformer Based Depth Estimation via Self-Supervised Learning","authors":"Daechan Han;Jeongmin Shin;Namil Kim;Soonmin Hwang;Yukyung Choi","doi":"10.1109/LRA.2022.3196781","DOIUrl":null,"url":null,"abstract":"Recently, transformers have been widely adopted for various computer vision tasks and show promising results due to their ability to encode long-range spatial dependencies in an image effectively. However, very few studies on adopting transformers in self-supervised depth estimation have been conducted. When replacing the CNN architecture with the transformer in self-supervised learning of depth, we encounter several problems such as problematic multi-scale photometric loss function when used with transformers and, insufficient ability to capture local details. In this letter, we propose an attention-based decoder module, Pixel-Wise Skip Attention (PWSA), to enhance fine details in feature maps while keeping global context from transformers. In addition, we propose utilizing self-distillation loss with single-scale photometric loss to alleviate the instability of transformer training by using correct training signals. We demonstrate that the proposed model performs accurate predictions on large objects and thin structures that require global context and local details. Our model achieves state-of-the-art performance among the self-supervised monocular depth estimation methods on KITTI and DDAD benchmarks.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":null,"pages":null},"PeriodicalIF":4.6000,"publicationDate":"2022-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Robotics and Automation Letters","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/9851497/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ROBOTICS","Score":null,"Total":0}

引用次数: 7

Abstract

Recently, transformers have been widely adopted for various computer vision tasks and show promising results due to their ability to encode long-range spatial dependencies in an image effectively. However, very few studies on adopting transformers in self-supervised depth estimation have been conducted. When replacing the CNN architecture with the transformer in self-supervised learning of depth, we encounter several problems such as problematic multi-scale photometric loss function when used with transformers and, insufficient ability to capture local details. In this letter, we propose an attention-based decoder module, Pixel-Wise Skip Attention (PWSA), to enhance fine details in feature maps while keeping global context from transformers. In addition, we propose utilizing self-distillation loss with single-scale photometric loss to alleviate the instability of transformer training by using correct training signals. We demonstrate that the proposed model performs accurate predictions on large objects and thin structures that require global context and local details. Our model achieves state-of-the-art performance among the self-supervised monocular depth estimation methods on KITTI and DDAD benchmarks.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

TransDSSL:基于自监督学习的变压器深度估计

最近，变压器被广泛应用于各种计算机视觉任务，并且由于它们能够有效地编码图像中的远程空间依赖关系而显示出有希望的结果。然而，在自监督深度估计中采用变压器的研究很少。在深度自监督学习中，用变压器代替CNN架构时，我们遇到了一些问题，如变压器使用时的多尺度光度损失函数问题，以及捕获局部细节的能力不足。在这封信中，我们提出了一个基于注意力的解码器模块，像素级跳过注意(PWSA)，以增强特征图中的细节，同时保持全局上下文不受变形器的影响。此外，我们提出利用自蒸馏损耗和单尺度光度损耗，通过正确的训练信号来减轻变压器训练的不稳定性。我们证明了所提出的模型对需要全局上下文和局部细节的大型对象和薄结构进行准确的预测。我们的模型在KITTI和DDAD基准上的自监督单目深度估计方法中达到了最先进的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Robotics and Automation Letters Computer Science-Computer Science Applications

CiteScore

9.60

自引率

15.40%

发文量

1428

期刊介绍： The scope of this journal is to publish peer-reviewed articles that provide a timely and concise account of innovative research ideas and application results, reporting significant theoretical findings and application case studies in areas of robotics and automation.

期刊最新文献

CMGFA: A BEV Segmentation Model Based on Cross-Modal Group-Mix Attention Feature Aggregator Visual-Inertial Localization Leveraging Skylight Polarization Pattern Constraints Virtual Obstacles Regulation for Multi-Agent Path Finding Shape Visual Servoing of a Cable Suspended Between Two Drones A Benchmark Dataset for Collaborative SLAM in Service Environments