Stephan van der Westhuizen, Gerard B. M. Heuvelink, David P. Hofmeyr, Laura Poggio, Madlene Nussbaum, Colby Brungard
{"title":"Mapping soil thickness by accounting for right-censored data with survival probabilities and machine learning","authors":"Stephan van der Westhuizen, Gerard B. M. Heuvelink, David P. Hofmeyr, Laura Poggio, Madlene Nussbaum, Colby Brungard","doi":"10.1111/ejss.13589","DOIUrl":null,"url":null,"abstract":"<p>In digital soil mapping, modelling soil thickness poses a challenge due to the prevalent issue of right-censored data. This means that the true soil thickness exceeds the depth of sampling, and neglecting to account for the censored nature of the data can lead to poor model performance and underestimation of the true soil thickness. Survival analysis is a well-established domain of statistical modelling that can deal with censored data. The random survival forest is a notable example of a survival-related machine learning approach used to address right-censored soil property data in digital soil mapping. Previous studies that employed this model either focused on mapping the probability of soil thickness exceeding certain depths, and thereby not mapping soil thickness itself, or dismissed it due to perceived poor performance. In this study, we propose an alternative survival model to map soil thickness that is based on the inverse probability of censoring weighting. In this approach, calibration data are weighted by the inverse of the probability that soil thickness exceeds a certain depth, that is, a survival probability. These weights can then be used with most machine learning models. We used the weights with a regular random forest, and compared it with a random survival forest, and other strategies for handling right-censored data, through a comprehensive synthetic simulation study and two real-world case studies. The results suggest that the weighted random forest model produces competitive predictions, establishing it as a viable option for mapping right-censored soil property data.</p>","PeriodicalId":12043,"journal":{"name":"European Journal of Soil Science","volume":"75 5","pages":""},"PeriodicalIF":4.0000,"publicationDate":"2024-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/ejss.13589","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Journal of Soil Science","FirstCategoryId":"97","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/ejss.13589","RegionNum":2,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"SOIL SCIENCE","Score":null,"Total":0}
引用次数: 0
Abstract
In digital soil mapping, modelling soil thickness poses a challenge due to the prevalent issue of right-censored data. This means that the true soil thickness exceeds the depth of sampling, and neglecting to account for the censored nature of the data can lead to poor model performance and underestimation of the true soil thickness. Survival analysis is a well-established domain of statistical modelling that can deal with censored data. The random survival forest is a notable example of a survival-related machine learning approach used to address right-censored soil property data in digital soil mapping. Previous studies that employed this model either focused on mapping the probability of soil thickness exceeding certain depths, and thereby not mapping soil thickness itself, or dismissed it due to perceived poor performance. In this study, we propose an alternative survival model to map soil thickness that is based on the inverse probability of censoring weighting. In this approach, calibration data are weighted by the inverse of the probability that soil thickness exceeds a certain depth, that is, a survival probability. These weights can then be used with most machine learning models. We used the weights with a regular random forest, and compared it with a random survival forest, and other strategies for handling right-censored data, through a comprehensive synthetic simulation study and two real-world case studies. The results suggest that the weighted random forest model produces competitive predictions, establishing it as a viable option for mapping right-censored soil property data.
期刊介绍:
The EJSS is an international journal that publishes outstanding papers in soil science that advance the theoretical and mechanistic understanding of physical, chemical and biological processes and their interactions in soils acting from molecular to continental scales in natural and managed environments.