Qingliang Li , Qiyun Xiao , Cheng Zhang , Jinlong Zhu , Xiao Chen , Yuguang Yan , Pingping Liu , Wei Shangguan , Zhongwang Wei , Lu Li , Wenzong Dong , Yongjiu Dai
{"title":"通过聚类平均采样策略改进全球土壤水分预测","authors":"Qingliang Li , Qiyun Xiao , Cheng Zhang , Jinlong Zhu , Xiao Chen , Yuguang Yan , Pingping Liu , Wei Shangguan , Zhongwang Wei , Lu Li , Wenzong Dong , Yongjiu Dai","doi":"10.1016/j.geoderma.2024.116999","DOIUrl":null,"url":null,"abstract":"<div><p>Understanding and predicting global soil moisture (SM) is crucial for water resource management and agricultural production. While deep learning methods (DL) have shown strong performance in SM prediction, imbalances in training samples with different characteristics pose a significant challenge. We propose that improving the diversity and balance of batch training samples during gradient descent can help address this issue. To test this hypothesis, we developed a Cluster-Averaged Sampling (CAS) strategy utilizing unsupervised learning techniques. This approach involves training the model with evenly sampled data from different clusters, ensuring both sample diversity and numerical consistency within each cluster. This approach prevents the model from overemphasizing specific sample characteristics, leading to more balanced feature learning. Experiments using the LandBench1.0 dataset with five different seeds for 1-day lead-time global predictions reveal that CAS outperforms several Long Short-Term Memory (LSTM)-based models that do not employ this strategy. The median Coefficient of Determination (R<sup>2</sup>) improved by 2.36 % to 4.31 %, while Kling-Gupta Efficiency (KGE) improved by 1.95 % to 3.16 %. In high-latitude areas, R<sup>2</sup> improvements exceeded 40 % in specific regions. To further validate CAS under realistic conditions, we tested it using the Soil Moisture Active and Passive Level 3 (SMAP-L3) satellite data for 1 to 3-day lead-time global predictions, confirming its efficacy. The study substantiates the CAS strategy and introduces a novel training method for enhancing the generalization of DL models.</p></div>","PeriodicalId":12511,"journal":{"name":"Geoderma","volume":null,"pages":null},"PeriodicalIF":5.6000,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0016706124002283/pdfft?md5=7b7a5fc5b0181bfd9cd70f884cf867ba&pid=1-s2.0-S0016706124002283-main.pdf","citationCount":"0","resultStr":"{\"title\":\"Improving global soil moisture prediction through cluster-averaged sampling strategy\",\"authors\":\"Qingliang Li , Qiyun Xiao , Cheng Zhang , Jinlong Zhu , Xiao Chen , Yuguang Yan , Pingping Liu , Wei Shangguan , Zhongwang Wei , Lu Li , Wenzong Dong , Yongjiu Dai\",\"doi\":\"10.1016/j.geoderma.2024.116999\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Understanding and predicting global soil moisture (SM) is crucial for water resource management and agricultural production. While deep learning methods (DL) have shown strong performance in SM prediction, imbalances in training samples with different characteristics pose a significant challenge. We propose that improving the diversity and balance of batch training samples during gradient descent can help address this issue. To test this hypothesis, we developed a Cluster-Averaged Sampling (CAS) strategy utilizing unsupervised learning techniques. This approach involves training the model with evenly sampled data from different clusters, ensuring both sample diversity and numerical consistency within each cluster. This approach prevents the model from overemphasizing specific sample characteristics, leading to more balanced feature learning. Experiments using the LandBench1.0 dataset with five different seeds for 1-day lead-time global predictions reveal that CAS outperforms several Long Short-Term Memory (LSTM)-based models that do not employ this strategy. The median Coefficient of Determination (R<sup>2</sup>) improved by 2.36 % to 4.31 %, while Kling-Gupta Efficiency (KGE) improved by 1.95 % to 3.16 %. In high-latitude areas, R<sup>2</sup> improvements exceeded 40 % in specific regions. To further validate CAS under realistic conditions, we tested it using the Soil Moisture Active and Passive Level 3 (SMAP-L3) satellite data for 1 to 3-day lead-time global predictions, confirming its efficacy. The study substantiates the CAS strategy and introduces a novel training method for enhancing the generalization of DL models.</p></div>\",\"PeriodicalId\":12511,\"journal\":{\"name\":\"Geoderma\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":5.6000,\"publicationDate\":\"2024-08-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S0016706124002283/pdfft?md5=7b7a5fc5b0181bfd9cd70f884cf867ba&pid=1-s2.0-S0016706124002283-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Geoderma\",\"FirstCategoryId\":\"97\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0016706124002283\",\"RegionNum\":1,\"RegionCategory\":\"农林科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"SOIL SCIENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Geoderma","FirstCategoryId":"97","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0016706124002283","RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"SOIL SCIENCE","Score":null,"Total":0}
Improving global soil moisture prediction through cluster-averaged sampling strategy
Understanding and predicting global soil moisture (SM) is crucial for water resource management and agricultural production. While deep learning methods (DL) have shown strong performance in SM prediction, imbalances in training samples with different characteristics pose a significant challenge. We propose that improving the diversity and balance of batch training samples during gradient descent can help address this issue. To test this hypothesis, we developed a Cluster-Averaged Sampling (CAS) strategy utilizing unsupervised learning techniques. This approach involves training the model with evenly sampled data from different clusters, ensuring both sample diversity and numerical consistency within each cluster. This approach prevents the model from overemphasizing specific sample characteristics, leading to more balanced feature learning. Experiments using the LandBench1.0 dataset with five different seeds for 1-day lead-time global predictions reveal that CAS outperforms several Long Short-Term Memory (LSTM)-based models that do not employ this strategy. The median Coefficient of Determination (R2) improved by 2.36 % to 4.31 %, while Kling-Gupta Efficiency (KGE) improved by 1.95 % to 3.16 %. In high-latitude areas, R2 improvements exceeded 40 % in specific regions. To further validate CAS under realistic conditions, we tested it using the Soil Moisture Active and Passive Level 3 (SMAP-L3) satellite data for 1 to 3-day lead-time global predictions, confirming its efficacy. The study substantiates the CAS strategy and introduces a novel training method for enhancing the generalization of DL models.
期刊介绍:
Geoderma - the global journal of soil science - welcomes authors, readers and soil research from all parts of the world, encourages worldwide soil studies, and embraces all aspects of soil science and its associated pedagogy. The journal particularly welcomes interdisciplinary work focusing on dynamic soil processes and functions across space and time.