Budiman Minasny , Toshiyuki Bandai , Teamrat A. Ghezzehei , Yin-Chung Huang , Yuxin Ma , Alex B. McBratney , Wartini Ng , Sarem Norouzi , Jose Padarian , Rudiyanto , Amin Sharififar , Quentin Styc , Marliana Widyastuti
{"title":"基于土壤科学的机器学习","authors":"Budiman Minasny , Toshiyuki Bandai , Teamrat A. Ghezzehei , Yin-Chung Huang , Yuxin Ma , Alex B. McBratney , Wartini Ng , Sarem Norouzi , Jose Padarian , Rudiyanto , Amin Sharififar , Quentin Styc , Marliana Widyastuti","doi":"10.1016/j.geoderma.2024.117094","DOIUrl":null,"url":null,"abstract":"<div><div>Machine learning (ML) applications in soil science have significantly increased over the past two decades, reflecting a growing trend towards data-driven research addressing soil security. This extensive application has mainly focused on enhancing predictions of soil properties, particularly soil organic carbon, and improving the accuracy of digital soil mapping (DSM). Despite these advancements, the application of ML in soil science faces challenges related to data scarcity and the interpretability of ML models. There is a need for a shift towards Soil Science-Informed ML (SoilML) models that use the power of ML but also incorporate soil science knowledge in the training process to make predictions more reliable and generalisable. This paper proposes methodologies for embedding ML models with soil science knowledge to overcome current limitations. Incorporating soil science knowledge into ML models involves using observational priors to enhance training datasets, designing model structures which reflect soil science principles, and supervising model training with soil science-informed loss functions. The informed loss functions include observational constraints, coherency rules such as regularisation to avoid overfitting, and prior or soil-knowledge constraints that incorporate existing information about the parameters or outputs. By way of illustration, we present examples from four fields: digital soil mapping, soil spectroscopy, pedotransfer functions, and dynamic soil property models. We discuss the potential to integrate process-based models for improved prediction, the use of physics-informed neural networks, limitations, and the issue of overparametrisation. These approaches improve the relevance of ML predictions in soil science and enhance the models’ ability to generalise across different scenarios while maintaining soil science principles, transparency and reliability.</div></div>","PeriodicalId":12511,"journal":{"name":"Geoderma","volume":"452 ","pages":"Article 117094"},"PeriodicalIF":5.6000,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Soil Science-Informed Machine Learning\",\"authors\":\"Budiman Minasny , Toshiyuki Bandai , Teamrat A. Ghezzehei , Yin-Chung Huang , Yuxin Ma , Alex B. McBratney , Wartini Ng , Sarem Norouzi , Jose Padarian , Rudiyanto , Amin Sharififar , Quentin Styc , Marliana Widyastuti\",\"doi\":\"10.1016/j.geoderma.2024.117094\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Machine learning (ML) applications in soil science have significantly increased over the past two decades, reflecting a growing trend towards data-driven research addressing soil security. This extensive application has mainly focused on enhancing predictions of soil properties, particularly soil organic carbon, and improving the accuracy of digital soil mapping (DSM). Despite these advancements, the application of ML in soil science faces challenges related to data scarcity and the interpretability of ML models. There is a need for a shift towards Soil Science-Informed ML (SoilML) models that use the power of ML but also incorporate soil science knowledge in the training process to make predictions more reliable and generalisable. This paper proposes methodologies for embedding ML models with soil science knowledge to overcome current limitations. Incorporating soil science knowledge into ML models involves using observational priors to enhance training datasets, designing model structures which reflect soil science principles, and supervising model training with soil science-informed loss functions. The informed loss functions include observational constraints, coherency rules such as regularisation to avoid overfitting, and prior or soil-knowledge constraints that incorporate existing information about the parameters or outputs. By way of illustration, we present examples from four fields: digital soil mapping, soil spectroscopy, pedotransfer functions, and dynamic soil property models. We discuss the potential to integrate process-based models for improved prediction, the use of physics-informed neural networks, limitations, and the issue of overparametrisation. These approaches improve the relevance of ML predictions in soil science and enhance the models’ ability to generalise across different scenarios while maintaining soil science principles, transparency and reliability.</div></div>\",\"PeriodicalId\":12511,\"journal\":{\"name\":\"Geoderma\",\"volume\":\"452 \",\"pages\":\"Article 117094\"},\"PeriodicalIF\":5.6000,\"publicationDate\":\"2024-11-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Geoderma\",\"FirstCategoryId\":\"97\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0016706124003239\",\"RegionNum\":1,\"RegionCategory\":\"农林科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"SOIL SCIENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Geoderma","FirstCategoryId":"97","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0016706124003239","RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"SOIL SCIENCE","Score":null,"Total":0}
引用次数: 0
摘要
过去二十年来,机器学习(ML)在土壤科学中的应用大幅增加,反映了针对土壤安全的数据驱动型研究的发展趋势。这种广泛的应用主要集中在加强对土壤特性(尤其是土壤有机碳)的预测,以及提高数字土壤制图(DSM)的准确性。尽管取得了这些进展,但在土壤科学中应用 ML 还面临着与数据稀缺和 ML 模型可解释性有关的挑战。有必要向土壤科学信息 ML(SoilML)模型转变,这种模型既能利用 ML 的强大功能,又能在训练过程中融入土壤科学知识,从而使预测结果更加可靠、更有普适性。本文提出了将土壤科学知识嵌入 ML 模型的方法,以克服当前的局限性。将土壤科学知识融入 ML 模型涉及使用观测先验来增强训练数据集、设计反映土壤科学原理的模型结构,以及使用土壤科学知识损失函数监督模型训练。有根据的损失函数包括观测约束、一致性规则(如正则化以避免过度拟合)以及先验或土壤知识约束,这些约束包含了有关参数或输出的现有信息。为了说明问题,我们列举了四个领域的例子:数字土壤制图、土壤光谱学、土壤转移函数和动态土壤特性模型。我们讨论了整合基于过程的模型以改进预测的潜力、物理信息神经网络的使用、局限性以及过度参数化的问题。这些方法提高了土壤科学中 ML 预测的相关性,增强了模型在不同情况下的通用能力,同时保持了土壤科学的原则、透明度和可靠性。
Machine learning (ML) applications in soil science have significantly increased over the past two decades, reflecting a growing trend towards data-driven research addressing soil security. This extensive application has mainly focused on enhancing predictions of soil properties, particularly soil organic carbon, and improving the accuracy of digital soil mapping (DSM). Despite these advancements, the application of ML in soil science faces challenges related to data scarcity and the interpretability of ML models. There is a need for a shift towards Soil Science-Informed ML (SoilML) models that use the power of ML but also incorporate soil science knowledge in the training process to make predictions more reliable and generalisable. This paper proposes methodologies for embedding ML models with soil science knowledge to overcome current limitations. Incorporating soil science knowledge into ML models involves using observational priors to enhance training datasets, designing model structures which reflect soil science principles, and supervising model training with soil science-informed loss functions. The informed loss functions include observational constraints, coherency rules such as regularisation to avoid overfitting, and prior or soil-knowledge constraints that incorporate existing information about the parameters or outputs. By way of illustration, we present examples from four fields: digital soil mapping, soil spectroscopy, pedotransfer functions, and dynamic soil property models. We discuss the potential to integrate process-based models for improved prediction, the use of physics-informed neural networks, limitations, and the issue of overparametrisation. These approaches improve the relevance of ML predictions in soil science and enhance the models’ ability to generalise across different scenarios while maintaining soil science principles, transparency and reliability.
期刊介绍:
Geoderma - the global journal of soil science - welcomes authors, readers and soil research from all parts of the world, encourages worldwide soil studies, and embraces all aspects of soil science and its associated pedagogy. The journal particularly welcomes interdisciplinary work focusing on dynamic soil processes and functions across space and time.