Deep Cross-Modal Retrieval for Remote Sensing Image and Audio

Gou Mao, Yuan Yuan, Lu Xiaoqiang
{"title":"Deep Cross-Modal Retrieval for Remote Sensing Image and Audio","authors":"Gou Mao, Yuan Yuan, Lu Xiaoqiang","doi":"10.1109/PRRS.2018.8486338","DOIUrl":null,"url":null,"abstract":"Remote sensing image retrieval has many important applications in civilian and military fields, such as disaster monitoring and target detecting. However, the existing research on image retrieval, mainly including to two directions, text based and content based, cannot meet the rapid and convenient needs of some special applications and emergency scenes. Based on text, the retrieval is limited by keyboard inputting because of its lower efficiency for some urgent situations and based on content, it needs an example image as reference, which usually does not exist. Yet speech, as a direct, natural and efficient human-machine interactive way, can make up these shortcomings. Hence, a novel cross-modal retrieval method for remote sensing image and spoken audio is proposed in this paper. We first build a large-scale remote sensing image dataset with plenty of manual annotated spoken audio captions for the cross-modal retrieval task. Then a Deep Visual-Audio Network is designed to directly learn the correspondence of image and audio. And this model integrates feature extracting and multi-modal learning into the same network. Experiments on the proposed dataset verify the effectiveness of our approach and prove that it is feasible for speech-to-image retrieval.","PeriodicalId":197319,"journal":{"name":"2018 10th IAPR Workshop on Pattern Recognition in Remote Sensing (PRRS)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2018-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"43","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 10th IAPR Workshop on Pattern Recognition in Remote Sensing (PRRS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PRRS.2018.8486338","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 43

Abstract

Remote sensing image retrieval has many important applications in civilian and military fields, such as disaster monitoring and target detecting. However, the existing research on image retrieval, mainly including to two directions, text based and content based, cannot meet the rapid and convenient needs of some special applications and emergency scenes. Based on text, the retrieval is limited by keyboard inputting because of its lower efficiency for some urgent situations and based on content, it needs an example image as reference, which usually does not exist. Yet speech, as a direct, natural and efficient human-machine interactive way, can make up these shortcomings. Hence, a novel cross-modal retrieval method for remote sensing image and spoken audio is proposed in this paper. We first build a large-scale remote sensing image dataset with plenty of manual annotated spoken audio captions for the cross-modal retrieval task. Then a Deep Visual-Audio Network is designed to directly learn the correspondence of image and audio. And this model integrates feature extracting and multi-modal learning into the same network. Experiments on the proposed dataset verify the effectiveness of our approach and prove that it is feasible for speech-to-image retrieval.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
遥感图像和音频的深度跨模态检索
遥感图像检索在灾害监测、目标探测等民用和军事领域有着重要的应用。然而,现有的图像检索研究主要包括基于文本和基于内容两个方向,不能满足一些特殊应用和应急场景快速便捷的需求。基于文本的检索由于在某些紧急情况下效率较低而受到键盘输入的限制,而基于内容的检索则需要一个通常不存在的示例图像作为参考。而语音作为一种直接、自然、高效的人机交互方式,可以弥补这些不足。为此,本文提出了一种新的遥感图像和语音的跨模态检索方法。我们首先建立了一个大规模的遥感图像数据集,其中包含大量手动注释的语音字幕,用于跨模式检索任务。然后设计了一个深度视音频网络,直接学习图像和音频的对应关系。该模型将特征提取和多模态学习集成到同一个网络中。在该数据集上的实验验证了该方法的有效性,并证明了该方法对语音到图像的检索是可行的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
The UAV Image Classification Method Based on the Grey-Sigmoid Kernel Function Support Vector Machine Fine Registration of Mobile and Airborne LiDAR Data Based on Common Ground Points Instance Segmentation of Trees in Urban Areas from MLS Point Clouds Using Supervoxel Contexts and Graph-Based Optimization An Improved Simplex Maximum Distance Algorithm for Endmember Extraction in Hyperspectral Image End-to-End Road Centerline Extraction via Learning a Confidence Map
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1