{"title":"Geodesic Based Image Matching Network for the Multi-scale Ground to Aerial Geo-localization","authors":"A. A. Rasna, C. Mohan","doi":"10.1109/AERO55745.2023.10115935","DOIUrl":null,"url":null,"abstract":"Airport surveillance activities using remote sensing images are challenging due to object variations largely affecting the geo-localization and object detection/segmentation tasks. Furthermore, the problem of localization is even larger due to scale variations. Traditionally image-based geo-referencing is accomplished by superimposing ground positioning system (GPS) location to the queried image. It is also observed both the query and the geo-tagged reference images are taken from the same ground view or aerial height in the case of remote sensing images. In our research, we intend to revisit the scale effect on object variability, by introducing the concept of geodesic representations along with image-matching networks. The architecture pipeline introduces a data processing layer wherein objects are geo-referenced to generate the metadata information. This metadata consists of three-dimensional data including the orientation information of the object. A regression task is added to the training set which leverages the metadata information. We use the gradient weighted class activation maps (Grad-CAM) to generate the activation maps and selection based on high threshold values for the pixel. The orientations and the locations are further calculated using the geodesic representations. The baseline architecture for local feature extraction uses a simple Siamese network with a ResNet backbone network. A NetVLAD layer is used to generate the global features. We also introduce a Geospatial attention network (GsAN) to aid in enhanced localization of objects. The dataset used for experiments consisted of CVUSA and our custom dataset providing airport runway views for different scales and arbitrary orientations. The performance evaluations focused on recall as a retrieval metric and comparing various loss functions. The performance metrics indicate a higher accuracy rate.","PeriodicalId":344285,"journal":{"name":"2023 IEEE Aerospace Conference","volume":"30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE Aerospace Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AERO55745.2023.10115935","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Airport surveillance activities using remote sensing images are challenging due to object variations largely affecting the geo-localization and object detection/segmentation tasks. Furthermore, the problem of localization is even larger due to scale variations. Traditionally image-based geo-referencing is accomplished by superimposing ground positioning system (GPS) location to the queried image. It is also observed both the query and the geo-tagged reference images are taken from the same ground view or aerial height in the case of remote sensing images. In our research, we intend to revisit the scale effect on object variability, by introducing the concept of geodesic representations along with image-matching networks. The architecture pipeline introduces a data processing layer wherein objects are geo-referenced to generate the metadata information. This metadata consists of three-dimensional data including the orientation information of the object. A regression task is added to the training set which leverages the metadata information. We use the gradient weighted class activation maps (Grad-CAM) to generate the activation maps and selection based on high threshold values for the pixel. The orientations and the locations are further calculated using the geodesic representations. The baseline architecture for local feature extraction uses a simple Siamese network with a ResNet backbone network. A NetVLAD layer is used to generate the global features. We also introduce a Geospatial attention network (GsAN) to aid in enhanced localization of objects. The dataset used for experiments consisted of CVUSA and our custom dataset providing airport runway views for different scales and arbitrary orientations. The performance evaluations focused on recall as a retrieval metric and comparing various loss functions. The performance metrics indicate a higher accuracy rate.