Improving global soil moisture prediction through cluster-averaged sampling strategy

IF 5.6 1区农林科学 Q1 SOIL SCIENCE Geoderma Pub Date : 2024-08-13 DOI:10.1016/j.geoderma.2024.116999

Qingliang Li , Qiyun Xiao , Cheng Zhang , Jinlong Zhu , Xiao Chen , Yuguang Yan , Pingping Liu , Wei Shangguan , Zhongwang Wei , Lu Li , Wenzong Dong , Yongjiu Dai

{"title":"Improving global soil moisture prediction through cluster-averaged sampling strategy","authors":"Qingliang Li , Qiyun Xiao , Cheng Zhang , Jinlong Zhu , Xiao Chen , Yuguang Yan , Pingping Liu , Wei Shangguan , Zhongwang Wei , Lu Li , Wenzong Dong , Yongjiu Dai","doi":"10.1016/j.geoderma.2024.116999","DOIUrl":null,"url":null,"abstract":"<div><p>Understanding and predicting global soil moisture (SM) is crucial for water resource management and agricultural production. While deep learning methods (DL) have shown strong performance in SM prediction, imbalances in training samples with different characteristics pose a significant challenge. We propose that improving the diversity and balance of batch training samples during gradient descent can help address this issue. To test this hypothesis, we developed a Cluster-Averaged Sampling (CAS) strategy utilizing unsupervised learning techniques. This approach involves training the model with evenly sampled data from different clusters, ensuring both sample diversity and numerical consistency within each cluster. This approach prevents the model from overemphasizing specific sample characteristics, leading to more balanced feature learning. Experiments using the LandBench1.0 dataset with five different seeds for 1-day lead-time global predictions reveal that CAS outperforms several Long Short-Term Memory (LSTM)-based models that do not employ this strategy. The median Coefficient of Determination (R<sup>2</sup>) improved by 2.36 % to 4.31 %, while Kling-Gupta Efficiency (KGE) improved by 1.95 % to 3.16 %. In high-latitude areas, R<sup>2</sup> improvements exceeded 40 % in specific regions. To further validate CAS under realistic conditions, we tested it using the Soil Moisture Active and Passive Level 3 (SMAP-L3) satellite data for 1 to 3-day lead-time global predictions, confirming its efficacy. The study substantiates the CAS strategy and introduces a novel training method for enhancing the generalization of DL models.</p></div>","PeriodicalId":12511,"journal":{"name":"Geoderma","volume":null,"pages":null},"PeriodicalIF":5.6000,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0016706124002283/pdfft?md5=7b7a5fc5b0181bfd9cd70f884cf867ba&pid=1-s2.0-S0016706124002283-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Geoderma","FirstCategoryId":"97","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0016706124002283","RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"SOIL SCIENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Understanding and predicting global soil moisture (SM) is crucial for water resource management and agricultural production. While deep learning methods (DL) have shown strong performance in SM prediction, imbalances in training samples with different characteristics pose a significant challenge. We propose that improving the diversity and balance of batch training samples during gradient descent can help address this issue. To test this hypothesis, we developed a Cluster-Averaged Sampling (CAS) strategy utilizing unsupervised learning techniques. This approach involves training the model with evenly sampled data from different clusters, ensuring both sample diversity and numerical consistency within each cluster. This approach prevents the model from overemphasizing specific sample characteristics, leading to more balanced feature learning. Experiments using the LandBench1.0 dataset with five different seeds for 1-day lead-time global predictions reveal that CAS outperforms several Long Short-Term Memory (LSTM)-based models that do not employ this strategy. The median Coefficient of Determination (R²) improved by 2.36 % to 4.31 %, while Kling-Gupta Efficiency (KGE) improved by 1.95 % to 3.16 %. In high-latitude areas, R² improvements exceeded 40 % in specific regions. To further validate CAS under realistic conditions, we tested it using the Soil Moisture Active and Passive Level 3 (SMAP-L3) satellite data for 1 to 3-day lead-time global predictions, confirming its efficacy. The study substantiates the CAS strategy and introduces a novel training method for enhancing the generalization of DL models.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

通过聚类平均采样策略改进全球土壤水分预测

了解和预测全球土壤湿度（SM）对于水资源管理和农业生产至关重要。虽然深度学习方法（DL）在土壤湿度预测方面表现出了很强的性能，但具有不同特征的训练样本的不平衡性带来了巨大的挑战。我们提出，在梯度下降过程中改善批量训练样本的多样性和平衡性有助于解决这一问题。为了验证这一假设，我们利用无监督学习技术开发了集群平均采样（CAS）策略。这种方法是用来自不同集群的均匀采样数据来训练模型，确保每个集群内的样本多样性和数值一致性。这种方法可以防止模型过分强调特定样本的特征，从而实现更均衡的特征学习。使用 LandBench1.0 数据集和五种不同的种子进行 1 天提前期全局预测的实验表明，CAS 的表现优于未采用这种策略的几种基于长短期记忆（LSTM）的模型。中位判定系数（R）提高了 2.36 %，达到 4.31 %；Kling-Gupta 效率（KGE）提高了 1.95 %，达到 3.16 %。在高纬度地区，特定区域的 R 提高了 40% 以上。为了在现实条件下进一步验证 CAS，我们使用土壤水分主动和被动三级（SMAP-L3）卫星数据对其进行了 1 至 3 天提前期全球预测测试，证实了其功效。这项研究证实了 CAS 策略，并引入了一种新的训练方法来增强 DL 模型的泛化能力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Geoderma 农林科学-土壤科学

CiteScore

11.80

自引率

6.60%

发文量

597

审稿时长

58 days

期刊介绍： Geoderma - the global journal of soil science - welcomes authors, readers and soil research from all parts of the world, encourages worldwide soil studies, and embraces all aspects of soil science and its associated pedagogy. The journal particularly welcomes interdisciplinary work focusing on dynamic soil processes and functions across space and time.