Pub Date : 2024-10-21DOI: 10.1109/LSP.2024.3484330
Tingyu Wang;Zihao Yang;Quan Chen;Yaoqi Sun;Chenggang Yan
Vision-based aerial-view geo-localization aims to match drone- and satellite-views of the same geographical location. Several feature partition strategies divide spatial features to mine contextual information. However, the compression from fine-grained features to visual descriptors is ill-considered, that is, classical pooling destroys discriminative features while increasing the sensitivity of networks to contextual information. In order to clarify this, we first review existing pooling layer and analyze their pros and cons when applied in feature compression. Inspired by the appearance of aerial views, we then summarize an ideal feature compression operation, i.e., precisely highlighting the central target while maximizing the use of environmental information in a feature-smoothing manner. To achieve the above process, we propose a distance-dependent parameter initialization strategy and form a novel pooling called $D^{2}$