A2dele: Adaptive and Attentive Depth Distiller for Efficient RGB-D Salient Object Detection

2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Pub Date : 2020-06-01 DOI:10.1109/CVPR42600.2020.00908

Yongri Piao, Zhengkun Rong, Miao Zhang, Weisong Ren, Huchuan Lu

{"title":"A2dele: Adaptive and Attentive Depth Distiller for Efficient RGB-D Salient Object Detection","authors":"Yongri Piao, Zhengkun Rong, Miao Zhang, Weisong Ren, Huchuan Lu","doi":"10.1109/CVPR42600.2020.00908","DOIUrl":null,"url":null,"abstract":"Existing state-of-the-art RGB-D salient object detection methods explore RGB-D data relying on a two-stream architecture, in which an independent subnetwork is required to process depth data. This inevitably incurs extra computational costs and memory consumption, and using depth data during testing may hinder the practical applications of RGB-D saliency detection. To tackle these two dilemmas, we propose a depth distiller (A2dele) to explore the way of using network prediction and attention as two bridges to transfer the depth knowledge from the depth stream to the RGB stream. First, by adaptively minimizing the differences between predictions generated from the depth stream and RGB stream, we realize the desired control of pixel-wise depth knowledge transferred to the RGB stream. Second, to transfer the localization knowledge to RGB features, we encourage consistencies between the dilated prediction of the depth stream and the attention map from the RGB stream. As a result, we achieve a lightweight architecture without use of depth data at test time by embedding our A2dele. Our extensive experimental evaluation on five benchmarks demonstrate that our RGB stream achieves state-of-the-art performance, which tremendously minimizes the model size by 76% and runs 12 times faster, compared with the best performing method. Furthermore, our A2dele can be applied to existing RGB-D networks to significantly improve their efficiency while maintaining performance (boosts FPS by nearly twice for DMRA and 3 times for CPFP).","PeriodicalId":6715,"journal":{"name":"2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"64 1","pages":"9057-9066"},"PeriodicalIF":0.0000,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"155","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CVPR42600.2020.00908","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 155

Abstract

Existing state-of-the-art RGB-D salient object detection methods explore RGB-D data relying on a two-stream architecture, in which an independent subnetwork is required to process depth data. This inevitably incurs extra computational costs and memory consumption, and using depth data during testing may hinder the practical applications of RGB-D saliency detection. To tackle these two dilemmas, we propose a depth distiller (A2dele) to explore the way of using network prediction and attention as two bridges to transfer the depth knowledge from the depth stream to the RGB stream. First, by adaptively minimizing the differences between predictions generated from the depth stream and RGB stream, we realize the desired control of pixel-wise depth knowledge transferred to the RGB stream. Second, to transfer the localization knowledge to RGB features, we encourage consistencies between the dilated prediction of the depth stream and the attention map from the RGB stream. As a result, we achieve a lightweight architecture without use of depth data at test time by embedding our A2dele. Our extensive experimental evaluation on five benchmarks demonstrate that our RGB stream achieves state-of-the-art performance, which tremendously minimizes the model size by 76% and runs 12 times faster, compared with the best performing method. Furthermore, our A2dele can be applied to existing RGB-D networks to significantly improve their efficiency while maintaining performance (boosts FPS by nearly twice for DMRA and 3 times for CPFP).

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

A2dele:高效RGB-D显著目标检测的自适应和专注深度蒸馏器

现有最先进的RGB-D显著目标检测方法依赖于两流架构来探索RGB-D数据，其中需要一个独立的子网来处理深度数据。这不可避免地会产生额外的计算成本和内存消耗，并且在测试期间使用深度数据可能会阻碍RGB-D显著性检测的实际应用。为了解决这两个困境，我们提出了一个深度蒸馏器(A2dele)来探索使用网络预测和注意力作为两个桥梁将深度知识从深度流转移到RGB流的方法。首先，通过自适应地最小化深度流和RGB流生成的预测之间的差异，我们实现了对传输到RGB流的逐像素深度知识的期望控制。其次，为了将定位知识转移到RGB特征，我们鼓励深度流的扩展预测与RGB流的注意图之间的一致性。因此，通过嵌入我们的A2dele，我们实现了一个轻量级的架构，而无需在测试时使用深度数据。我们在五个基准测试上的广泛实验评估表明，我们的RGB流达到了最先进的性能，与性能最好的方法相比，它将模型大小极大地减少了76%，运行速度提高了12倍。此外，我们的A2dele可以应用于现有的RGB-D网络，在保持性能的同时显著提高其效率(DMRA将FPS提高近两倍，CPFP提高3倍)。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

自引率

0.00%

发文量

期刊最新文献

Geometric Structure Based and Regularized Depth Estimation From 360 Indoor Imagery 3D Part Guided Image Editing for Fine-Grained Object Understanding SDC-Depth: Semantic Divide-and-Conquer Network for Monocular Depth Estimation Approximating shapes in images with low-complexity polygons PFRL: Pose-Free Reinforcement Learning for 6D Pose Estimation