通过聚类平均采样策略改进全球土壤水分预测

IF 5.6 1区 农林科学 Q1 SOIL SCIENCE Geoderma Pub Date : 2024-08-13 DOI:10.1016/j.geoderma.2024.116999
Qingliang Li , Qiyun Xiao , Cheng Zhang , Jinlong Zhu , Xiao Chen , Yuguang Yan , Pingping Liu , Wei Shangguan , Zhongwang Wei , Lu Li , Wenzong Dong , Yongjiu Dai
{"title":"通过聚类平均采样策略改进全球土壤水分预测","authors":"Qingliang Li ,&nbsp;Qiyun Xiao ,&nbsp;Cheng Zhang ,&nbsp;Jinlong Zhu ,&nbsp;Xiao Chen ,&nbsp;Yuguang Yan ,&nbsp;Pingping Liu ,&nbsp;Wei Shangguan ,&nbsp;Zhongwang Wei ,&nbsp;Lu Li ,&nbsp;Wenzong Dong ,&nbsp;Yongjiu Dai","doi":"10.1016/j.geoderma.2024.116999","DOIUrl":null,"url":null,"abstract":"<div><p>Understanding and predicting global soil moisture (SM) is crucial for water resource management and agricultural production. While deep learning methods (DL) have shown strong performance in SM prediction, imbalances in training samples with different characteristics pose a significant challenge. We propose that improving the diversity and balance of batch training samples during gradient descent can help address this issue. To test this hypothesis, we developed a Cluster-Averaged Sampling (CAS) strategy utilizing unsupervised learning techniques. This approach involves training the model with evenly sampled data from different clusters, ensuring both sample diversity and numerical consistency within each cluster. This approach prevents the model from overemphasizing specific sample characteristics, leading to more balanced feature learning. Experiments using the LandBench1.0 dataset with five different seeds for 1-day lead-time global predictions reveal that CAS outperforms several Long Short-Term Memory (LSTM)-based models that do not employ this strategy. The median Coefficient of Determination (R<sup>2</sup>) improved by 2.36 % to 4.31 %, while Kling-Gupta Efficiency (KGE) improved by 1.95 % to 3.16 %. In high-latitude areas, R<sup>2</sup> improvements exceeded 40 % in specific regions. To further validate CAS under realistic conditions, we tested it using the Soil Moisture Active and Passive Level 3 (SMAP-L3) satellite data for 1 to 3-day lead-time global predictions, confirming its efficacy. The study substantiates the CAS strategy and introduces a novel training method for enhancing the generalization of DL models.</p></div>","PeriodicalId":12511,"journal":{"name":"Geoderma","volume":null,"pages":null},"PeriodicalIF":5.6000,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0016706124002283/pdfft?md5=7b7a5fc5b0181bfd9cd70f884cf867ba&pid=1-s2.0-S0016706124002283-main.pdf","citationCount":"0","resultStr":"{\"title\":\"Improving global soil moisture prediction through cluster-averaged sampling strategy\",\"authors\":\"Qingliang Li ,&nbsp;Qiyun Xiao ,&nbsp;Cheng Zhang ,&nbsp;Jinlong Zhu ,&nbsp;Xiao Chen ,&nbsp;Yuguang Yan ,&nbsp;Pingping Liu ,&nbsp;Wei Shangguan ,&nbsp;Zhongwang Wei ,&nbsp;Lu Li ,&nbsp;Wenzong Dong ,&nbsp;Yongjiu Dai\",\"doi\":\"10.1016/j.geoderma.2024.116999\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Understanding and predicting global soil moisture (SM) is crucial for water resource management and agricultural production. While deep learning methods (DL) have shown strong performance in SM prediction, imbalances in training samples with different characteristics pose a significant challenge. We propose that improving the diversity and balance of batch training samples during gradient descent can help address this issue. To test this hypothesis, we developed a Cluster-Averaged Sampling (CAS) strategy utilizing unsupervised learning techniques. This approach involves training the model with evenly sampled data from different clusters, ensuring both sample diversity and numerical consistency within each cluster. This approach prevents the model from overemphasizing specific sample characteristics, leading to more balanced feature learning. Experiments using the LandBench1.0 dataset with five different seeds for 1-day lead-time global predictions reveal that CAS outperforms several Long Short-Term Memory (LSTM)-based models that do not employ this strategy. The median Coefficient of Determination (R<sup>2</sup>) improved by 2.36 % to 4.31 %, while Kling-Gupta Efficiency (KGE) improved by 1.95 % to 3.16 %. In high-latitude areas, R<sup>2</sup> improvements exceeded 40 % in specific regions. To further validate CAS under realistic conditions, we tested it using the Soil Moisture Active and Passive Level 3 (SMAP-L3) satellite data for 1 to 3-day lead-time global predictions, confirming its efficacy. The study substantiates the CAS strategy and introduces a novel training method for enhancing the generalization of DL models.</p></div>\",\"PeriodicalId\":12511,\"journal\":{\"name\":\"Geoderma\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":5.6000,\"publicationDate\":\"2024-08-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S0016706124002283/pdfft?md5=7b7a5fc5b0181bfd9cd70f884cf867ba&pid=1-s2.0-S0016706124002283-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Geoderma\",\"FirstCategoryId\":\"97\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0016706124002283\",\"RegionNum\":1,\"RegionCategory\":\"农林科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"SOIL SCIENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Geoderma","FirstCategoryId":"97","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0016706124002283","RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"SOIL SCIENCE","Score":null,"Total":0}
引用次数: 0

摘要

了解和预测全球土壤湿度(SM)对于水资源管理和农业生产至关重要。虽然深度学习方法(DL)在土壤湿度预测方面表现出了很强的性能,但具有不同特征的训练样本的不平衡性带来了巨大的挑战。我们提出,在梯度下降过程中改善批量训练样本的多样性和平衡性有助于解决这一问题。为了验证这一假设,我们利用无监督学习技术开发了集群平均采样(CAS)策略。这种方法是用来自不同集群的均匀采样数据来训练模型,确保每个集群内的样本多样性和数值一致性。这种方法可以防止模型过分强调特定样本的特征,从而实现更均衡的特征学习。使用 LandBench1.0 数据集和五种不同的种子进行 1 天提前期全局预测的实验表明,CAS 的表现优于未采用这种策略的几种基于长短期记忆(LSTM)的模型。中位判定系数(R)提高了 2.36 %,达到 4.31 %;Kling-Gupta 效率(KGE)提高了 1.95 %,达到 3.16 %。在高纬度地区,特定区域的 R 提高了 40% 以上。为了在现实条件下进一步验证 CAS,我们使用土壤水分主动和被动三级(SMAP-L3)卫星数据对其进行了 1 至 3 天提前期全球预测测试,证实了其功效。这项研究证实了 CAS 策略,并引入了一种新的训练方法来增强 DL 模型的泛化能力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Improving global soil moisture prediction through cluster-averaged sampling strategy

Understanding and predicting global soil moisture (SM) is crucial for water resource management and agricultural production. While deep learning methods (DL) have shown strong performance in SM prediction, imbalances in training samples with different characteristics pose a significant challenge. We propose that improving the diversity and balance of batch training samples during gradient descent can help address this issue. To test this hypothesis, we developed a Cluster-Averaged Sampling (CAS) strategy utilizing unsupervised learning techniques. This approach involves training the model with evenly sampled data from different clusters, ensuring both sample diversity and numerical consistency within each cluster. This approach prevents the model from overemphasizing specific sample characteristics, leading to more balanced feature learning. Experiments using the LandBench1.0 dataset with five different seeds for 1-day lead-time global predictions reveal that CAS outperforms several Long Short-Term Memory (LSTM)-based models that do not employ this strategy. The median Coefficient of Determination (R2) improved by 2.36 % to 4.31 %, while Kling-Gupta Efficiency (KGE) improved by 1.95 % to 3.16 %. In high-latitude areas, R2 improvements exceeded 40 % in specific regions. To further validate CAS under realistic conditions, we tested it using the Soil Moisture Active and Passive Level 3 (SMAP-L3) satellite data for 1 to 3-day lead-time global predictions, confirming its efficacy. The study substantiates the CAS strategy and introduces a novel training method for enhancing the generalization of DL models.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Geoderma
Geoderma 农林科学-土壤科学
CiteScore
11.80
自引率
6.60%
发文量
597
审稿时长
58 days
期刊介绍: Geoderma - the global journal of soil science - welcomes authors, readers and soil research from all parts of the world, encourages worldwide soil studies, and embraces all aspects of soil science and its associated pedagogy. The journal particularly welcomes interdisciplinary work focusing on dynamic soil processes and functions across space and time.
期刊最新文献
Monitoring soil cracking using OFDR-based distributed temperature sensing framework Depth impacts on the aggregate-mediated mechanisms of root carbon stabilization in soil: Trade-off between MAOM and POM pathways Can inert pool models improve predictions of biochar long-term persistence in soils? Impact of a synthetic zeolite mixed with soils of different pedological characteristics on soil physical quality indices Driving factors of variation in fertilizer nitrogen recovery efficiency in maize cropping systems across China and its microbial mechanism
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1