Determining minimum sample size for the conditioned Latin hypercube sampling algorithm

IF 5.2 2区 农林科学 Q1 SOIL SCIENCE Pedosphere Pub Date : 2024-06-01 DOI:10.1016/j.pedsph.2022.09.001
Daniel D. SAURETTE , Asim BISWAS , Richard J. HECK , Adam W. GILLESPIE , Aaron A. BERG
{"title":"Determining minimum sample size for the conditioned Latin hypercube sampling algorithm","authors":"Daniel D. SAURETTE ,&nbsp;Asim BISWAS ,&nbsp;Richard J. HECK ,&nbsp;Adam W. GILLESPIE ,&nbsp;Aaron A. BERG","doi":"10.1016/j.pedsph.2022.09.001","DOIUrl":null,"url":null,"abstract":"<div><p>In digital soil mapping (DSM), a fundamental assumption is that the spatial variability of the target variable can be explained by the predictors or environmental covariates. Strategies to adequately sample the predictors have been well documented, with the conditioned Latin hypercube sampling (cLHS) algorithm receiving the most attention in the DSM community. Despite advances in sampling design, a critical gap remains in determining the number of samples required for DSM projects. We propose a simple workflow and function coded in R language to determine the minimum sample size for the cLHS algorithm based on histograms of the predictor variables using the Freedman-Diaconis rule for determining optimal bin width. Data preprocessing was included to correct for multimodal and non-normally distributed data, as these can affect sample size determination from the histogram. Based on a user-selected quantile range (QR) for the sample plan, the densities of the histogram bins at the upper and lower bounds of the QR were used as a scaling factor to determine minimum sample size. This technique was applied to a field-scale set of environmental covariates for a well-sampled agricultural study site near Guelph, Ontario, Canada, and tested across a range of QRs. The results showed increasing minimum sample size with an increase in the QR selected. Minimum sample size increased from 44 to 83 when the QR increased from 50% to 95% and then increased exponentially to 194 for the 99% QR. This technique provides an estimate of minimum sample size that can be used as an input to the cLHS algorithm.</p></div>","PeriodicalId":49709,"journal":{"name":"Pedosphere","volume":"34 3","pages":"Pages 530-539"},"PeriodicalIF":5.2000,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pedosphere","FirstCategoryId":"97","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1002016022000868","RegionNum":2,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"SOIL SCIENCE","Score":null,"Total":0}
引用次数: 0

Abstract

In digital soil mapping (DSM), a fundamental assumption is that the spatial variability of the target variable can be explained by the predictors or environmental covariates. Strategies to adequately sample the predictors have been well documented, with the conditioned Latin hypercube sampling (cLHS) algorithm receiving the most attention in the DSM community. Despite advances in sampling design, a critical gap remains in determining the number of samples required for DSM projects. We propose a simple workflow and function coded in R language to determine the minimum sample size for the cLHS algorithm based on histograms of the predictor variables using the Freedman-Diaconis rule for determining optimal bin width. Data preprocessing was included to correct for multimodal and non-normally distributed data, as these can affect sample size determination from the histogram. Based on a user-selected quantile range (QR) for the sample plan, the densities of the histogram bins at the upper and lower bounds of the QR were used as a scaling factor to determine minimum sample size. This technique was applied to a field-scale set of environmental covariates for a well-sampled agricultural study site near Guelph, Ontario, Canada, and tested across a range of QRs. The results showed increasing minimum sample size with an increase in the QR selected. Minimum sample size increased from 44 to 83 when the QR increased from 50% to 95% and then increased exponentially to 194 for the 99% QR. This technique provides an estimate of minimum sample size that can be used as an input to the cLHS algorithm.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
确定条件拉丁超立方抽样算法的最小样本量
在数字土壤制图(DSM)中,一个基本假设是目标变量的空间变异性可以用预测因子或环境协变量来解释。对预测因子进行充分采样的策略已被详细记录,其中条件拉丁超立方采样(cLHS)算法最受 DSM 界的关注。尽管在抽样设计方面取得了进步,但在确定 DSM 项目所需的样本数量方面仍存在重大差距。我们提出了一个简单的工作流程和用 R 语言编码的函数,根据预测变量的直方图,使用 Freedman-Diaconis 规则来确定最佳二进制宽度,从而确定 cLHS 算法的最小样本量。数据预处理包括对多模态和非正态分布数据进行校正,因为这些数据会影响根据直方图确定样本量。根据用户为样本计划选择的量化范围 (QR),将 QR 上下限的直方图分区密度用作比例因子,以确定最小样本量。该技术适用于加拿大安大略省圭尔夫附近一个取样良好的农业研究地点的实地环境协变量集,并在一系列 QR 中进行了测试。结果表明,随着所选 QR 的增加,最小样本量也在增加。当 QR 值从 50% 增加到 95% 时,最小样本量从 44 个增加到 83 个,当 QR 值达到 99% 时,最小样本量以指数形式增加到 194 个。这项技术提供了最小样本量的估计值,可用作 cLHS 算法的输入。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Pedosphere
Pedosphere 环境科学-土壤科学
CiteScore
11.70
自引率
1.80%
发文量
147
审稿时长
5.0 months
期刊介绍: PEDOSPHERE—a peer-reviewed international journal published bimonthly in English—welcomes submissions from scientists around the world under a broad scope of topics relevant to timely, high quality original research findings, especially up-to-date achievements and advances in the entire field of soil science studies dealing with environmental science, ecology, agriculture, bioscience, geoscience, forestry, etc. It publishes mainly original research articles as well as some reviews, mini reviews, short communications and special issues.
期刊最新文献
Wheat morphological and biochemical responses to copper oxide nanoparticles in two soils Evaluating the necessity of autumn irrigation on salinized soil by considering changes in soil physicochemical properties Combining conservation tillage with nitrogen fertilization promotes maize straw decomposition by regulating soil microbial community and enzyme activities Global and regional soil organic carbon estimates: Magnitudes and uncertainties Cadmium found in peanut (Arachis hypogaea L.) kernels mainly originates from root uptake rather than shell absorption from soil
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1