Remote sensing (RS) image-text retrieval is a practical and challenging task that has received considerable attention. Currently, most approaches rely on either convolutional neural networks or Transformers, which cannot effectively extract both global and fine-grained features simultaneously. Furthermore, the problem of high intramodal similarity in the RS domain poses a challenge for feature learning. In addition, the characteristics of model training at different stages seem to be neglected in most studies. In order to tackle these problems, we propose a fine-grained information supplementation (FGIS) and value-guided learning model that leverages prior knowledge in the RS domain for feature supplementation and employs a value-guided training approach to learn fine-grained, expressive, and robust feature representations. Specifically, we introduce the FGIS module to facilitate the supplementation of fine-grained visual features, thereby enhancing perceptual abilities for both global and local features. Furthermore, we mitigate the problem of high intra-modal similarity by proposing two loss functions: the weighted contrastive loss and the scene-adaptive fine-grained perceptual loss. Finally, we design a value-guided learning framework that focuses on the most important information at each stage of training. Extensive experiments on the remote sensing image captioning dataset (RSICD) and remote sensing image text match dataset (RSITMD) datasets verify the effectiveness and superiority of our model.
{"title":"Fine-Grained Information Supplementation and Value-Guided Learning for Remote Sensing Image-Text Retrieval","authors":"Zihui Zhou;Yong Feng;Agen Qiu;Guofan Duan;Mingliang Zhou","doi":"10.1109/JSTARS.2024.3480014","DOIUrl":"https://doi.org/10.1109/JSTARS.2024.3480014","url":null,"abstract":"Remote sensing (RS) image-text retrieval is a practical and challenging task that has received considerable attention. Currently, most approaches rely on either convolutional neural networks or Transformers, which cannot effectively extract both global and fine-grained features simultaneously. Furthermore, the problem of high intramodal similarity in the RS domain poses a challenge for feature learning. In addition, the characteristics of model training at different stages seem to be neglected in most studies. In order to tackle these problems, we propose a fine-grained information supplementation (FGIS) and value-guided learning model that leverages prior knowledge in the RS domain for feature supplementation and employs a value-guided training approach to learn fine-grained, expressive, and robust feature representations. Specifically, we introduce the FGIS module to facilitate the supplementation of fine-grained visual features, thereby enhancing perceptual abilities for both global and local features. Furthermore, we mitigate the problem of high intra-modal similarity by proposing two loss functions: the weighted contrastive loss and the scene-adaptive fine-grained perceptual loss. Finally, we design a value-guided learning framework that focuses on the most important information at each stage of training. Extensive experiments on the remote sensing image captioning dataset (RSICD) and remote sensing image text match dataset (RSITMD) datasets verify the effectiveness and superiority of our model.","PeriodicalId":13116,"journal":{"name":"IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing","volume":"17 ","pages":"19194-19210"},"PeriodicalIF":4.7,"publicationDate":"2024-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10716520","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142550542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The conversion of agricultural lands, termed “nonagriculturalization,” poses profound threats to food security and ecological stability. Remote sensing image change detection offers an invaluable tool for monitoring this phenomenon. However, most change detection techniques prioritize image comparison over exploiting accumulated vector datasets. Additionally, many current methods are not readily applicable in practical scenarios due to inadequate model generalization capabilities and a scarcity of samples, resulting in a continued reliance on manual intervention for nonagriculturalization detection. In response, this article introduces a novel change detection approach for nonagriculturalization based on the vector data and contrastive learning. Initially, the boundary-constrained simple noniterative clustering algorithm is applied to segment two-phase images under vector data guidance. Samples are then generated using an adaptive cropping method. For early phase image samples, a collaborative validation-based sample annotation framework is employed to optimize and annotate the samples, with the purified high-quality samples serving as the training set for subsequent classification. For later-phase image samples, only those within the cropland vector polygons are retained for prediction. Building on this, a semi-supervised cross-domain contrastive learning framework is proposed for remote sensing scene classification. Ultimately, by integrating nonagriculturalization rules and postprocessing techniques, areas undergoing nonagriculturalization are further detected. Validating our methodology on Wuxi and Yangzhou datasets yielded precision rates of 91.57% and 89.21%, with recall rates of 93.68% and 90.51%, respectively. These outcomes affirm the effectiveness of our method in nonagriculturalization detection, offering robust technical support for research in this domain.
{"title":"Nonagriculturalization Detection Based on Vector Polygons and Contrastive Learning With High-Resolution Remote Sensing Images","authors":"Hui Zhang;Wei Liu;Changming Zhu;Hao Niu;Pengcheng Yin;Shiling Dong;Jialin Wu;Erzhu Li;Lianpeng Zhang","doi":"10.1109/JSTARS.2024.3476131","DOIUrl":"https://doi.org/10.1109/JSTARS.2024.3476131","url":null,"abstract":"The conversion of agricultural lands, termed “nonagriculturalization,” poses profound threats to food security and ecological stability. Remote sensing image change detection offers an invaluable tool for monitoring this phenomenon. However, most change detection techniques prioritize image comparison over exploiting accumulated vector datasets. Additionally, many current methods are not readily applicable in practical scenarios due to inadequate model generalization capabilities and a scarcity of samples, resulting in a continued reliance on manual intervention for nonagriculturalization detection. In response, this article introduces a novel change detection approach for nonagriculturalization based on the vector data and contrastive learning. Initially, the boundary-constrained simple noniterative clustering algorithm is applied to segment two-phase images under vector data guidance. Samples are then generated using an adaptive cropping method. For early phase image samples, a collaborative validation-based sample annotation framework is employed to optimize and annotate the samples, with the purified high-quality samples serving as the training set for subsequent classification. For later-phase image samples, only those within the cropland vector polygons are retained for prediction. Building on this, a semi-supervised cross-domain contrastive learning framework is proposed for remote sensing scene classification. Ultimately, by integrating nonagriculturalization rules and postprocessing techniques, areas undergoing nonagriculturalization are further detected. Validating our methodology on Wuxi and Yangzhou datasets yielded precision rates of 91.57% and 89.21%, with recall rates of 93.68% and 90.51%, respectively. These outcomes affirm the effectiveness of our method in nonagriculturalization detection, offering robust technical support for research in this domain.","PeriodicalId":13116,"journal":{"name":"IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing","volume":"17 ","pages":"18474-18488"},"PeriodicalIF":4.7,"publicationDate":"2024-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10716558","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142517955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}