Jiaqiang Yang;Danyang Qin;Huapeng Tang;Sili Tao;Haoze Bie;Lin Ma
{"title":"基于dinov2的低空城市环境无人机视觉自定位","authors":"Jiaqiang Yang;Danyang Qin;Huapeng Tang;Sili Tao;Haoze Bie;Lin Ma","doi":"10.1109/LRA.2025.3527762","DOIUrl":null,"url":null,"abstract":"Visual self-localization technology is essential for unmanned aerial vehicles (UAVs) to achieve autonomous navigation and mission execution in environments where global navigation satellite system (GNSS) signals are unavailable. This technology estimates the UAV's geographic location by performing cross-view matching between UAV and satellite images. However, significant viewpoint differences between UAV and satellite images result in poor accuracy for existing cross-view matching methods. To address this, we integrate the DINOv2 model with UAV visual localization tasks and propose a DINOv2-based UAV visual self-localization method. Considering the inherent differences between pre-trained models and cross-view matching tasks, we propose a global-local feature adaptive enhancement method (GLFA). This method leverages Transformer and multi-scale convolutions to capture long-range dependencies and local spatial information in visual images, improving the model's ability to recognize key discriminative landmarks. In addition, we propose a cross-enhancement method based on a spatial pyramid (CESP), which constructs a multi-scale spatial pyramid to cross-enhance features, effectively improving the ability of the features to perceive multi-scale spatial information. Experimental results demonstrate that the proposed method achieves impressive scores of 86.27% in R@1 and 88.87% in SDM@1 on the DenseUAV public benchmark dataset, providing a novel solution for UAV visual self-localization.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"10 2","pages":"2080-2087"},"PeriodicalIF":4.6000,"publicationDate":"2025-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"DINOv2-Based UAV Visual Self-Localization in Low-Altitude Urban Environments\",\"authors\":\"Jiaqiang Yang;Danyang Qin;Huapeng Tang;Sili Tao;Haoze Bie;Lin Ma\",\"doi\":\"10.1109/LRA.2025.3527762\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Visual self-localization technology is essential for unmanned aerial vehicles (UAVs) to achieve autonomous navigation and mission execution in environments where global navigation satellite system (GNSS) signals are unavailable. This technology estimates the UAV's geographic location by performing cross-view matching between UAV and satellite images. However, significant viewpoint differences between UAV and satellite images result in poor accuracy for existing cross-view matching methods. To address this, we integrate the DINOv2 model with UAV visual localization tasks and propose a DINOv2-based UAV visual self-localization method. Considering the inherent differences between pre-trained models and cross-view matching tasks, we propose a global-local feature adaptive enhancement method (GLFA). This method leverages Transformer and multi-scale convolutions to capture long-range dependencies and local spatial information in visual images, improving the model's ability to recognize key discriminative landmarks. In addition, we propose a cross-enhancement method based on a spatial pyramid (CESP), which constructs a multi-scale spatial pyramid to cross-enhance features, effectively improving the ability of the features to perceive multi-scale spatial information. Experimental results demonstrate that the proposed method achieves impressive scores of 86.27% in R@1 and 88.87% in SDM@1 on the DenseUAV public benchmark dataset, providing a novel solution for UAV visual self-localization.\",\"PeriodicalId\":13241,\"journal\":{\"name\":\"IEEE Robotics and Automation Letters\",\"volume\":\"10 2\",\"pages\":\"2080-2087\"},\"PeriodicalIF\":4.6000,\"publicationDate\":\"2025-01-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Robotics and Automation Letters\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10835173/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ROBOTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Robotics and Automation Letters","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10835173/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ROBOTICS","Score":null,"Total":0}
DINOv2-Based UAV Visual Self-Localization in Low-Altitude Urban Environments
Visual self-localization technology is essential for unmanned aerial vehicles (UAVs) to achieve autonomous navigation and mission execution in environments where global navigation satellite system (GNSS) signals are unavailable. This technology estimates the UAV's geographic location by performing cross-view matching between UAV and satellite images. However, significant viewpoint differences between UAV and satellite images result in poor accuracy for existing cross-view matching methods. To address this, we integrate the DINOv2 model with UAV visual localization tasks and propose a DINOv2-based UAV visual self-localization method. Considering the inherent differences between pre-trained models and cross-view matching tasks, we propose a global-local feature adaptive enhancement method (GLFA). This method leverages Transformer and multi-scale convolutions to capture long-range dependencies and local spatial information in visual images, improving the model's ability to recognize key discriminative landmarks. In addition, we propose a cross-enhancement method based on a spatial pyramid (CESP), which constructs a multi-scale spatial pyramid to cross-enhance features, effectively improving the ability of the features to perceive multi-scale spatial information. Experimental results demonstrate that the proposed method achieves impressive scores of 86.27% in R@1 and 88.87% in SDM@1 on the DenseUAV public benchmark dataset, providing a novel solution for UAV visual self-localization.
期刊介绍:
The scope of this journal is to publish peer-reviewed articles that provide a timely and concise account of innovative research ideas and application results, reporting significant theoretical findings and application case studies in areas of robotics and automation.