CrossLocate: Cross-modal Large-scale Visual Geo-Localization in Natural Environments using Rendered Modalities

2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) Pub Date : 2022-01-01 DOI:10.1109/WACV51458.2022.00225

Jan Tomešek, Martin Čadík, J. Brejcha

{"title":"CrossLocate: Cross-modal Large-scale Visual Geo-Localization in Natural Environments using Rendered Modalities","authors":"Jan Tomešek, Martin Čadík, J. Brejcha","doi":"10.1109/WACV51458.2022.00225","DOIUrl":null,"url":null,"abstract":"We propose a novel approach to visual geo-localization in natural environments. This is a challenging problem due to vast localization areas, the variable appearance of outdoor environments and the scarcity of available data. In order to make the research of new approaches possible, we first create two databases containing \"synthetic\" images of various modalities. These image modalities are rendered from a 3D terrain model and include semantic segmentations, silhouette maps and depth maps. By combining the rendered database views with existing datasets of photographs (used as \"‘queries\" to be localized), we create a unique benchmark for visual geo-localization in natural environments, which contains correspondences between query photographs and rendered database imagery. The distinct ability to match photographs to synthetically rendered databases defines our task as \"cross-modal\". On top of this benchmark, we provide thorough ablation studies analysing the localization potential of the database image modalities. We reveal the depth information as the best choice for outdoor localization. Finally, based on our observations, we carefully develop a fully-automatic method for large-scale cross-modal localization using image retrieval. We demonstrate its localization performance outdoors in the entire state of Switzerland. Our method reveals a large gap between operating within a single image domain (e.g. photographs) and working across domains (e.g. photographs matched to rendered images), as gained knowledge is not transferable between the two. Moreover, we show that modern localization methods fail when applied to such a cross- modal task and that our method achieves significantly better results than state-of-the-art approaches. The datasets, code and trained models are available on the project website: http://cphoto.fit.vutbr.cz/crosslocate/.","PeriodicalId":297092,"journal":{"name":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WACV51458.2022.00225","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

Abstract

We propose a novel approach to visual geo-localization in natural environments. This is a challenging problem due to vast localization areas, the variable appearance of outdoor environments and the scarcity of available data. In order to make the research of new approaches possible, we first create two databases containing "synthetic" images of various modalities. These image modalities are rendered from a 3D terrain model and include semantic segmentations, silhouette maps and depth maps. By combining the rendered database views with existing datasets of photographs (used as "‘queries" to be localized), we create a unique benchmark for visual geo-localization in natural environments, which contains correspondences between query photographs and rendered database imagery. The distinct ability to match photographs to synthetically rendered databases defines our task as "cross-modal". On top of this benchmark, we provide thorough ablation studies analysing the localization potential of the database image modalities. We reveal the depth information as the best choice for outdoor localization. Finally, based on our observations, we carefully develop a fully-automatic method for large-scale cross-modal localization using image retrieval. We demonstrate its localization performance outdoors in the entire state of Switzerland. Our method reveals a large gap between operating within a single image domain (e.g. photographs) and working across domains (e.g. photographs matched to rendered images), as gained knowledge is not transferable between the two. Moreover, we show that modern localization methods fail when applied to such a cross- modal task and that our method achieves significantly better results than state-of-the-art approaches. The datasets, code and trained models are available on the project website: http://cphoto.fit.vutbr.cz/crosslocate/.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

CrossLocate:使用渲染模态在自然环境中进行跨模态大规模视觉地理定位

我们提出了一种新的自然环境下的视觉地理定位方法。这是一个具有挑战性的问题，因为巨大的定位区域，室外环境的变化和可用数据的稀缺。为了使新方法的研究成为可能，我们首先创建了两个包含各种模式的“合成”图像的数据库。这些图像模态由3D地形模型渲染，包括语义分割、轮廓图和深度图。通过将呈现的数据库视图与现有的照片数据集(用作本地化的“查询”)相结合，我们为自然环境中的视觉地理定位创建了一个独特的基准，其中包含查询照片和呈现的数据库图像之间的对应关系。将照片与合成渲染数据库相匹配的独特能力将我们的任务定义为“跨模式”。在此基准之上，我们提供了彻底的消融研究，分析了数据库图像模式的定位潜力。我们将深度信息作为户外定位的最佳选择。最后，基于我们的观察，我们精心开发了一种使用图像检索进行大规模跨模态定位的全自动方法。我们在整个瑞士州的户外展示了它的本地化性能。我们的方法揭示了在单个图像域内操作(例如照片)和跨域工作(例如与渲染图像匹配的照片)之间的巨大差距，因为获得的知识不能在两者之间转移。此外，我们表明现代定位方法在应用于这种跨模态任务时失败，并且我们的方法比最先进的方法取得了显着更好的结果。数据集、代码和训练模型可在项目网站上获得:http://cphoto.fit.vutbr.cz/crosslocate/。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)

自引率

0.00%

发文量