Daniel D. SAURETTE , Asim BISWAS , Richard J. HECK , Adam W. GILLESPIE , Aaron A. BERG
{"title":"Determining minimum sample size for the conditioned Latin hypercube sampling algorithm","authors":"Daniel D. SAURETTE , Asim BISWAS , Richard J. HECK , Adam W. GILLESPIE , Aaron A. BERG","doi":"10.1016/j.pedsph.2022.09.001","DOIUrl":null,"url":null,"abstract":"<div><p>In digital soil mapping (DSM), a fundamental assumption is that the spatial variability of the target variable can be explained by the predictors or environmental covariates. Strategies to adequately sample the predictors have been well documented, with the conditioned Latin hypercube sampling (cLHS) algorithm receiving the most attention in the DSM community. Despite advances in sampling design, a critical gap remains in determining the number of samples required for DSM projects. We propose a simple workflow and function coded in R language to determine the minimum sample size for the cLHS algorithm based on histograms of the predictor variables using the Freedman-Diaconis rule for determining optimal bin width. Data preprocessing was included to correct for multimodal and non-normally distributed data, as these can affect sample size determination from the histogram. Based on a user-selected quantile range (QR) for the sample plan, the densities of the histogram bins at the upper and lower bounds of the QR were used as a scaling factor to determine minimum sample size. This technique was applied to a field-scale set of environmental covariates for a well-sampled agricultural study site near Guelph, Ontario, Canada, and tested across a range of QRs. The results showed increasing minimum sample size with an increase in the QR selected. Minimum sample size increased from 44 to 83 when the QR increased from 50% to 95% and then increased exponentially to 194 for the 99% QR. This technique provides an estimate of minimum sample size that can be used as an input to the cLHS algorithm.</p></div>","PeriodicalId":49709,"journal":{"name":"Pedosphere","volume":"34 3","pages":"Pages 530-539"},"PeriodicalIF":5.2000,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pedosphere","FirstCategoryId":"97","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1002016022000868","RegionNum":2,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"SOIL SCIENCE","Score":null,"Total":0}
引用次数: 0
Abstract
In digital soil mapping (DSM), a fundamental assumption is that the spatial variability of the target variable can be explained by the predictors or environmental covariates. Strategies to adequately sample the predictors have been well documented, with the conditioned Latin hypercube sampling (cLHS) algorithm receiving the most attention in the DSM community. Despite advances in sampling design, a critical gap remains in determining the number of samples required for DSM projects. We propose a simple workflow and function coded in R language to determine the minimum sample size for the cLHS algorithm based on histograms of the predictor variables using the Freedman-Diaconis rule for determining optimal bin width. Data preprocessing was included to correct for multimodal and non-normally distributed data, as these can affect sample size determination from the histogram. Based on a user-selected quantile range (QR) for the sample plan, the densities of the histogram bins at the upper and lower bounds of the QR were used as a scaling factor to determine minimum sample size. This technique was applied to a field-scale set of environmental covariates for a well-sampled agricultural study site near Guelph, Ontario, Canada, and tested across a range of QRs. The results showed increasing minimum sample size with an increase in the QR selected. Minimum sample size increased from 44 to 83 when the QR increased from 50% to 95% and then increased exponentially to 194 for the 99% QR. This technique provides an estimate of minimum sample size that can be used as an input to the cLHS algorithm.
期刊介绍:
PEDOSPHERE—a peer-reviewed international journal published bimonthly in English—welcomes submissions from scientists around the world under a broad scope of topics relevant to timely, high quality original research findings, especially up-to-date achievements and advances in the entire field of soil science studies dealing with environmental science, ecology, agriculture, bioscience, geoscience, forestry, etc. It publishes mainly original research articles as well as some reviews, mini reviews, short communications and special issues.