{"title":"Temporally-Aggregating Multiple-Discontinuous-Image Saliency Prediction with Transformer-Based Attention","authors":"Pin-Jie Huang, Chi-An Lu, Kuan-Wen Chen","doi":"10.1109/icra46639.2022.9811544","DOIUrl":null,"url":null,"abstract":"In this paper, we aim to apply deep saliency prediction to automatic drone exploration, which should consider not only one single image, but multiple images from different view angles or localizations in order to determine the exploration direction. However, little attention has been paid to such saliency prediction problem over multiple-discontinuous-image and none of existing methods take temporal information into consideration, which may mean that the current predicted saliency map is not consistent with the previous predicted results. For this purpose, we propose a method named Temporally-Aggregating Multiple-Discontinuous-Image Saliency Prediction Network (TA-MSNet). It utilizes a transformer-based attention module to correlate relative saliency information among multiple discontinuous images and, furthermore, applies the ConvLSTM module to capture the temporal information. Experiments show that the proposed TA-MSNet can estimate better and more consistent results than previous works for time series data.","PeriodicalId":341244,"journal":{"name":"2022 International Conference on Robotics and Automation (ICRA)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Conference on Robotics and Automation (ICRA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/icra46639.2022.9811544","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
In this paper, we aim to apply deep saliency prediction to automatic drone exploration, which should consider not only one single image, but multiple images from different view angles or localizations in order to determine the exploration direction. However, little attention has been paid to such saliency prediction problem over multiple-discontinuous-image and none of existing methods take temporal information into consideration, which may mean that the current predicted saliency map is not consistent with the previous predicted results. For this purpose, we propose a method named Temporally-Aggregating Multiple-Discontinuous-Image Saliency Prediction Network (TA-MSNet). It utilizes a transformer-based attention module to correlate relative saliency information among multiple discontinuous images and, furthermore, applies the ConvLSTM module to capture the temporal information. Experiments show that the proposed TA-MSNet can estimate better and more consistent results than previous works for time series data.