Worldwide image geolocalization aims to accurately predict the geographic location where a given image was captured. Due to the vast scale of the Earth and the uneven distribution of geographic features, this task remains highly challenging. Traditional methods exhibit clear limitations when handling global-scale data. To address these challenges, we propose GEOMR, an effective and adaptive framework that integrates image geographic features and human reasoning knowledge to enhance global geolocalization accuracy. GEOMR consists of two modules. The first module extracts geographic features from images by jointly learning multimodal features. The second module involves training a multimodal large language model in a two-phase process to enhance its geolocalization reasoning capabilities. The first phase learns human geolocalization reasoning knowledge, enabling the model to utilize geographic cues present in images effectively. The second phase focuses on learning how to use reference information to infer the correct geographic coordinates. Extensive experiments conducted on the IM2GPS3K, YFCC4K, and YFCC26K datasets demonstrate that GEOMR significantly outperforms state-of-the-art methods.
扫码关注我们
求助内容:
应助结果提醒方式:
