Temporally-Aggregating Multiple-Discontinuous-Image Saliency Prediction with Transformer-Based Attention

2022 International Conference on Robotics and Automation (ICRA) Pub Date : 2022-05-23 DOI:10.1109/icra46639.2022.9811544

Pin-Jie Huang, Chi-An Lu, Kuan-Wen Chen

引用次数: 3

Abstract

In this paper, we aim to apply deep saliency prediction to automatic drone exploration, which should consider not only one single image, but multiple images from different view angles or localizations in order to determine the exploration direction. However, little attention has been paid to such saliency prediction problem over multiple-discontinuous-image and none of existing methods take temporal information into consideration, which may mean that the current predicted saliency map is not consistent with the previous predicted results. For this purpose, we propose a method named Temporally-Aggregating Multiple-Discontinuous-Image Saliency Prediction Network (TA-MSNet). It utilizes a transformer-based attention module to correlate relative saliency information among multiple discontinuous images and, furthermore, applies the ConvLSTM module to capture the temporal information. Experiments show that the proposed TA-MSNet can estimate better and more consistent results than previous works for time series data.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于变压器注意力的时间聚合多不连续图像显著性预测

本文的目标是将深度显著性预测应用于无人机自动探测，该方法不仅要考虑单幅图像，而且要考虑不同视角或定位的多幅图像，以确定探测方向。然而，对于多幅不连续图像的显著性预测问题关注较少，现有的方法都没有考虑到时间信息，这可能意味着当前预测的显著性图与之前的预测结果不一致。为此，我们提出了一种时间聚合多不连续图像显著性预测网络(TA-MSNet)方法。它利用基于变换的注意力模块来关联多个不连续图像之间的相对显著性信息，并进一步应用ConvLSTM模块来捕获时间信息。实验表明，该方法对时间序列数据的估计结果比以往的方法更好、更一致。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2022 International Conference on Robotics and Automation (ICRA)

自引率

0.00%

发文量