Journal of Agricultural Biological and Environmental Statistics最新文献_第2页

A Modified Bayesian Optimization Approach for Determining a Training Set to Identify the Best Genotypes from a Candidate Population in Genomic Selection 在基因组选育中确定从候选种群中识别最佳基因型的训练集的修正贝叶斯优化方法

IF 1.4 4区数学 Q3 BIOLOGY

Journal of Agricultural Biological and Environmental Statistics

Pub Date : 2024-06-19 DOI: 10.1007/s13253-024-00632-y

Hui-Ning Tu, Chen-Tuo Liao

Training set optimization is a crucial factor affecting the probability of success for plant breeding programs using genomic selection. Conventionally, the training set optimization is developed to maximize Pearson’s correlation between true breeding values and genomic estimated breeding values for a testing population, because it is an essential component of genetic gain in plant breeding. However, many practical breeding programs aim to identify the best genotypes for target traits in a breeding population. A modified Bayesian optimization approach is therefore developed in this study to construct training sets for tackling such an interesting problem. The proposed approach is based on Monte Carlo simulation and data cross-validation, which is shown to be competitive with the existing methods developed to achieve the maximal Pearson’s correlation. Four real genome datasets, including two rice, one wheat, and one soybean, are analyzed in this study. An R package is generated to facilitate the application of the proposed approach. Supplementary materials accompanying this paper appear online.

训练集优化是影响使用基因组选择的植物育种计划成功概率的关键因素。传统上，训练集优化的目的是使测试群体的真实育种值与基因组估计育种值之间的皮尔逊相关性最大化，因为它是植物育种遗传增益的重要组成部分。然而，许多实际的育种计划都旨在确定育种群体中目标性状的最佳基因型。因此，本研究开发了一种改进的贝叶斯优化方法，以构建训练集来解决这一有趣的问题。所提出的方法基于蒙特卡罗模拟和数据交叉验证，与为实现最大皮尔逊相关性而开发的现有方法相比，具有很强的竞争力。本研究分析了四个真实基因组数据集，包括两个水稻、一个小麦和一个大豆。为了便于应用所提出的方法，我们生成了一个 R 软件包。本文所附的补充材料可在线查阅。

{"title":"A Modified Bayesian Optimization Approach for Determining a Training Set to Identify the Best Genotypes from a Candidate Population in Genomic Selection","authors":"Hui-Ning Tu, Chen-Tuo Liao","doi":"10.1007/s13253-024-00632-y","DOIUrl":"https://doi.org/10.1007/s13253-024-00632-y","url":null,"abstract":"Training set optimization is a crucial factor affecting the probability of success for plant breeding programs using genomic selection. Conventionally, the training set optimization is developed to maximize Pearson’s correlation between true breeding values and genomic estimated breeding values for a testing population, because it is an essential component of genetic gain in plant breeding. However, many practical breeding programs aim to identify the best genotypes for target traits in a breeding population. A modified Bayesian optimization approach is therefore developed in this study to construct training sets for tackling such an interesting problem. The proposed approach is based on Monte Carlo simulation and data cross-validation, which is shown to be competitive with the existing methods developed to achieve the maximal Pearson’s correlation. Four real genome datasets, including two rice, one wheat, and one soybean, are analyzed in this study. An R package is generated to facilitate the application of the proposed approach. Supplementary materials accompanying this paper appear online.","PeriodicalId":56336,"journal":{"name":"Journal of Agricultural Biological and Environmental Statistics","volume":"28 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141547000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Non-stationary Extensions of the Diffusion-Based Gaussian Matérn Field for Ecological Applications 生态应用中基于扩散的高斯马特恩场的非稳态扩展

IF 1.4 4区数学 Q3 BIOLOGY

Journal of Agricultural Biological and Environmental Statistics

Pub Date : 2024-05-31 DOI: 10.1007/s13253-024-00628-8

Juan Francisco Mandujano Reyes, Ian P. McGahan, Ting Fung Ma, Anne E. Ballmann, Daniel P. Walsh, Jun Zhu

The use of statistical methods informed by partial differential equations (PDEs) and in particular reaction–diffusion PDEs such as ecological diffusion equations (EDEs) has been studied and used to model spatiotemporal processes. In this paper, we consider a stochastic extension of the EDE (SEDE) and discuss its interpretation and main differences from the deterministic EDE. We then leverage a non-stationary extension of the diffusion-based Gaussian Matérn field and show that this extension has SEDE-like behavior. The elucidated connection enables us to find a finite element approximated solution for SEDEs by means of the stochastic partial differential equation (SPDE) Bayesian method. For illustration, we analyze the evolution of white-nose syndrome (WNS) in the continental USA, comparing two models: stationary SEDE and a non-stationary pseudo-SEDE. Our results demonstrate the importance of non-stationarity in wildlife disease modeling and identify spatial explanatory variables for the non-stationarity in the WNS process. Finally, a simulation study is conducted to assess the deviance information criterion for differentiating from the two models, as well as the identifiability of the model parameters.Supplementary materials accompanying this paper appear online.

人们研究并使用偏微分方程（PDE），特别是生态扩散方程（EDE）等反应扩散偏微分方程的统计方法来模拟时空过程。在本文中，我们考虑了 EDE 的随机扩展（SEDE），并讨论了其解释以及与确定性 EDE 的主要区别。然后，我们利用基于扩散的高斯马特恩场的非稳态扩展，证明这种扩展具有类似于 SEDE 的行为。阐明的联系使我们能够通过随机偏微分方程（SPDE）贝叶斯方法找到 SEDE 的有限元近似解。例如，我们分析了美国大陆白鼻综合征（WNS）的演变，比较了两种模型：静态 SEDE 和非静态伪 SEDE。我们的研究结果证明了非平稳性在野生动物疾病建模中的重要性，并确定了 WNS 过程中非平稳性的空间解释变量。最后，我们进行了一项模拟研究，以评估区分两种模型的偏差信息标准，以及模型参数的可识别性。

{"title":"Non-stationary Extensions of the Diffusion-Based Gaussian Matérn Field for Ecological Applications","authors":"Juan Francisco Mandujano Reyes, Ian P. McGahan, Ting Fung Ma, Anne E. Ballmann, Daniel P. Walsh, Jun Zhu","doi":"10.1007/s13253-024-00628-8","DOIUrl":"https://doi.org/10.1007/s13253-024-00628-8","url":null,"abstract":"The use of statistical methods informed by partial differential equations (PDEs) and in particular reaction–diffusion PDEs such as ecological diffusion equations (EDEs) has been studied and used to model spatiotemporal processes. In this paper, we consider a stochastic extension of the EDE (SEDE) and discuss its interpretation and main differences from the deterministic EDE. We then leverage a non-stationary extension of the diffusion-based Gaussian Matérn field and show that this extension has SEDE-like behavior. The elucidated connection enables us to find a finite element approximated solution for SEDEs by means of the stochastic partial differential equation (SPDE) Bayesian method. For illustration, we analyze the evolution of white-nose syndrome (WNS) in the continental USA, comparing two models: stationary SEDE and a non-stationary pseudo-SEDE. Our results demonstrate the importance of non-stationarity in wildlife disease modeling and identify spatial explanatory variables for the non-stationarity in the WNS process. Finally, a simulation study is conducted to assess the deviance information criterion for differentiating from the two models, as well as the identifiability of the model parameters.Supplementary materials accompanying this paper appear online.","PeriodicalId":56336,"journal":{"name":"Journal of Agricultural Biological and Environmental Statistics","volume":"5 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141195839","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An Inhomogeneous Weibull–Hawkes Process to Model Underdispersed Acoustic Cues 用非均质 Weibull-Hawkes 过程模拟分散不足的声学线索

IF 1.4 4区数学 Q3 BIOLOGY

Journal of Agricultural Biological and Environmental Statistics

Pub Date : 2024-05-11 DOI: 10.1007/s13253-024-00626-w

Alec B. M. Van Helsdingen, Tiago A. Marques, Charlotte M. Jones-Todd

A Hawkes point process describes self-exciting behaviour where event arrivals are triggered by historic events. These models are increasingly becoming a popular choice in analysing event-type data. Like all other inhomogeneous Poisson point processes, the waiting time between events in a Hawkes process is derived from an exponential distribution with mean one. However, as with many ecological and environmental data, this is an unrealistic assumption. We, therefore, extend and generalise the Hawkes process to account for potential under- or overdispersion in the waiting times between events by assuming the Weibull distribution as the foundation of the waiting times. We apply this model to the acoustic cue production times of sperm whales and show that our Weibull–Hawkes model better captures the inherent underdispersion in the interarrival times of echolocation clicks emitted by these whales.

霍克斯点过程描述了事件到达由历史事件触发的自激行为。这些模型正日益成为分析事件类型数据的热门选择。与所有其他不均匀泊松点过程一样，霍克斯过程中事件之间的等待时间来自均值为 1 的指数分布。然而，与许多生态和环境数据一样，这是一个不切实际的假设。因此，我们对霍克斯过程进行了扩展和概括，通过假设韦布尔分布作为等待时间的基础，来考虑事件之间等待时间的潜在不足或过度分散。我们将这一模型应用于抹香鲸的声学线索产生时间，结果表明我们的 Weibull-Hawkes 模型能更好地捕捉抹香鲸发出的回声定位咔嗒声到达时间的内在低分散性。

引用次数: 0

A class of models for large zero-inflated spatial data 大型零膨胀空间数据的一类模型

IF 1.4 4区数学 Q3 BIOLOGY

Journal of Agricultural Biological and Environmental Statistics

Pub Date : 2024-04-29 DOI: 10.1007/s13253-024-00619-9

Ben Seiyon Lee, Murali Haran

Spatially correlated data with an excess of zeros, usually referred to as zero-inflated spatial data, arise in many disciplines. Examples include count data, for instance, abundance (or lack thereof) of animal species and disease counts, as well as semi-continuous data like observed precipitation. Spatial two-part models are a flexible class of models for such data. Fitting two-part models can be computationally expensive for large data due to high-dimensional dependent latent variables, costly matrix operations, and slow mixing Markov chains. We describe a flexible, computationally efficient approach for modeling large zero-inflated spatial data using the projection-based intrinsic conditional autoregression (PICAR) framework. We study our approach, which we call PICAR-Z, through extensive simulation studies and two environmental data sets. Our results suggest that PICAR-Z provides accurate predictions while remaining computationally efficient. An important goal of our work is to allow researchers who are not experts in computation to easily build computationally efficient extensions to zero-inflated spatial models; this also allows for a more thorough exploration of modeling choices in two-part models than was previously possible. We show that PICAR-Z is easy to implement and extend in popular probabilistic programming languages such as nimble and stan.

零过多的空间相关数据（通常称为零膨胀空间数据）出现在许多学科中。例如，动物物种的丰度（或缺乏丰度）和疾病计数等计数数据，以及观测到的降水等半连续数据。空间两部分模型是此类数据的一类灵活模型。由于高维依赖潜变量、昂贵的矩阵运算和缓慢的混合马尔可夫链，拟合两部分模型对于大型数据来说计算成本很高。我们介绍了一种灵活、计算高效的方法，利用基于投影的本征条件自回归（PICAR）框架对大型零膨胀空间数据进行建模。我们通过大量的模拟研究和两个环境数据集来研究我们的方法，我们称之为 PICAR-Z。我们的结果表明，PICAR-Z 既能提供准确的预测，又能保持计算效率。我们工作的一个重要目标是，让不擅长计算的研究人员也能轻松建立计算效率高的零膨胀空间模型扩展；这也使得在两部分模型中对建模选择进行更深入的探索成为可能。我们的研究表明，PICAR-Z 很容易在流行的概率编程语言（如 nimble 和 stan）中实现和扩展。

{"title":"A class of models for large zero-inflated spatial data","authors":"Ben Seiyon Lee, Murali Haran","doi":"10.1007/s13253-024-00619-9","DOIUrl":"https://doi.org/10.1007/s13253-024-00619-9","url":null,"abstract":"Spatially correlated data with an excess of zeros, usually referred to as zero-inflated spatial data, arise in many disciplines. Examples include count data, for instance, abundance (or lack thereof) of animal species and disease counts, as well as semi-continuous data like observed precipitation. Spatial two-part models are a flexible class of models for such data. Fitting two-part models can be computationally expensive for large data due to high-dimensional dependent latent variables, costly matrix operations, and slow mixing Markov chains. We describe a flexible, computationally efficient approach for modeling large zero-inflated spatial data using the projection-based intrinsic conditional autoregression (PICAR) framework. We study our approach, which we call PICAR-Z, through extensive simulation studies and two environmental data sets. Our results suggest that PICAR-Z provides accurate predictions while remaining computationally efficient. An important goal of our work is to allow researchers who are not experts in computation to easily build computationally efficient extensions to zero-inflated spatial models; this also allows for a more thorough exploration of modeling choices in two-part models than was previously possible. We show that PICAR-Z is easy to implement and extend in popular probabilistic programming languages such as nimble and stan.","PeriodicalId":56336,"journal":{"name":"Journal of Agricultural Biological and Environmental Statistics","volume":"18 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140808823","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Modeling First Arrival of Migratory Birds Using a Hierarchical Max-Infinitely Divisible Process 利用分层最大无限可分过程模拟候鸟的首次抵达

IF 1.4 4区数学 Q3 BIOLOGY

Journal of Agricultural Biological and Environmental Statistics

Pub Date : 2024-04-28 DOI: 10.1007/s13253-024-00624-y

Dhanushi A. Wijeyakulasuriya, Ephraim M. Hanks, Benjamin A. Shaby

Humans have recorded the arrival dates of migratory birds for millennia, searching for trends and patterns. As the first arrival among individuals in a species is the realized tail of the probability distribution of arrivals, the appropriate statistical framework with which to analyze such events is extreme value theory. Here, for the first time, we apply formal extreme value techniques to the dynamics of bird migrations. We study the annual first arrivals of Magnolia Warblers using modern tools from the statistical field of extreme value analysis. Using observations from the eBird database, we model the spatial distribution of observed Magnolia Warbler arrivals as a max-infinitely divisible process, which allows us to spatially interpolate observed annual arrivals in a probabilistically coherent way and to project arrival dynamics into the future by conditioning on climatic variables. Supplementary materials accompanying this paper appear online.

千百年来，人类一直在记录候鸟的到达日期，寻找候鸟迁徙的趋势和规律。由于一个物种中首次到达的个体是到达概率分布中已实现的尾部，因此分析此类事件的适当统计框架是极值理论。在这里，我们首次将正式的极值技术应用到鸟类迁徙的动态过程中。我们使用极值分析统计领域的现代工具研究了木兰莺每年的首次到达。利用 eBird 数据库中的观测数据，我们将观测到的木兰莺到达的空间分布建模为一个最大无限可分过程，这使我们能够以一种概率一致的方式对观测到的年度到达进行空间插值，并通过气候变量的条件对未来的到达动态进行预测。本文附带的补充材料可在线查阅。

引用次数: 0

Regularized Latent Trajectory Models for Spatio-temporal Population Dynamics 用于时空种群动力学的正则化潜在轨迹模型

IF 1.4 4区数学 Q3 BIOLOGY

Journal of Agricultural Biological and Environmental Statistics

Pub Date : 2024-04-01 DOI: 10.1007/s13253-024-00616-y

Xinyi Lu, Yoichiro Kanno, George P. Valentine, Matt A. Kulp, Mevin B. Hooten

Climate change impacts ecosystems variably in space and time. Landscape features may confer resistance against environmental stressors, whose intensity and frequency also depend on local weather patterns. Characterizing spatio-temporal variation in population responses to these stressors improves our understanding of what constitutes climate change refugia. We developed a Bayesian hierarchical framework that allowed us to differentiate population responses to seasonal weather patterns depending on their “sensitive” or “resilient” states. The framework inferred these sensitivity states based on latent trajectories delineating dynamic state probabilities. The latent trajectories are composed of linear initial conditions, functional regression models, and additive random effects representing ecological mechanisms such as topological buffering and effects of legacy weather conditions. Further, we developed a Bayesian regularization strategy that promoted temporal coherence in the inferred states. We demonstrated our hierarchical framework and regularization strategy using simulated examples and a case study of native brook trout (Salvelinus fontinalis) count data from the Great Smoky Mountains National Park, southeastern USA. Our study provided insights into ecological processes influencing brook trout sensitivity. Our framework can also be applied to other species and ecosystems to facilitate management and conservation.

气候变化在空间和时间上对生态系统的影响各不相同。地貌特征可能会带来对环境压力的抵抗力，而环境压力的强度和频率也取决于当地的天气模式。描述种群对这些压力因子的反应的时空变化，有助于我们更好地理解什么是气候变化避难所。我们开发了一个贝叶斯分层框架，使我们能够根据 "敏感 "或 "复原 "状态来区分种群对季节性天气模式的反应。该框架根据划定动态状态概率的潜在轨迹来推断这些敏感状态。潜在轨迹由线性初始条件、函数回归模型和代表生态机制（如拓扑缓冲和遗留天气条件的影响）的加法随机效应组成。此外，我们还开发了一种贝叶斯正则化策略，以促进推断状态的时间一致性。我们利用模拟实例和美国东南部大烟山国家公园的本地溪鳟（Salvelinus fontinalis）计数数据案例研究，展示了我们的分层框架和正则化策略。我们的研究为了解影响鳟鱼敏感性的生态过程提供了见解。我们的框架也可应用于其他物种和生态系统，以促进管理和保护。

{"title":"Regularized Latent Trajectory Models for Spatio-temporal Population Dynamics","authors":"Xinyi Lu, Yoichiro Kanno, George P. Valentine, Matt A. Kulp, Mevin B. Hooten","doi":"10.1007/s13253-024-00616-y","DOIUrl":"https://doi.org/10.1007/s13253-024-00616-y","url":null,"abstract":"Climate change impacts ecosystems variably in space and time. Landscape features may confer resistance against environmental stressors, whose intensity and frequency also depend on local weather patterns. Characterizing spatio-temporal variation in population responses to these stressors improves our understanding of what constitutes climate change refugia. We developed a Bayesian hierarchical framework that allowed us to differentiate population responses to seasonal weather patterns depending on their “sensitive” or “resilient” states. The framework inferred these sensitivity states based on latent trajectories delineating dynamic state probabilities. The latent trajectories are composed of linear initial conditions, functional regression models, and additive random effects representing ecological mechanisms such as topological buffering and effects of legacy weather conditions. Further, we developed a Bayesian regularization strategy that promoted temporal coherence in the inferred states. We demonstrated our hierarchical framework and regularization strategy using simulated examples and a case study of native brook trout (Salvelinus fontinalis) count data from the Great Smoky Mountains National Park, southeastern USA. Our study provided insights into ecological processes influencing brook trout sensitivity. Our framework can also be applied to other species and ecosystems to facilitate management and conservation.","PeriodicalId":56336,"journal":{"name":"Journal of Agricultural Biological and Environmental Statistics","volume":"12 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140596104","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Models to Support Forest Inventory and Small Area Estimation Using Sparsely Sampled LiDAR: A Case Study Involving G-LiHT LiDAR in Tanana, Alaska 使用稀疏采样激光雷达支持森林资源清查和小面积估算的模型：阿拉斯加塔纳纳地区 G-LiHT 激光雷达案例研究

IF 1.4 4区数学 Q3 BIOLOGY

Journal of Agricultural Biological and Environmental Statistics

Pub Date : 2024-03-13 DOI: 10.1007/s13253-024-00611-3

Andrew O. Finley, Hans-Erik Andersen, Chad Babcock, Bruce D. Cook, Douglas C. Morton, Sudipto Banerjee

A two-stage hierarchical Bayesian model is developed and implemented to estimate forest biomass density and total given sparsely sampled LiDAR and georeferenced forest inventory plot measurements. The model is motivated by the United States Department of Agriculture (USDA) Forest Service Forest Inventory and Analysis (FIA) objective to provide biomass estimates for the remote Tanana Inventory Unit (TIU) in interior Alaska. The proposed model yields stratum-level biomass estimates for arbitrarily sized areas. Model-based estimates are compared with the TIU FIA design-based post-stratified estimates. Model-based small area estimates (SAEs) for two experimental forests within the TIU are compared with each forest’s design-based estimates generated using a dense network of independent inventory plots. Model parameter estimates and biomass predictions are informed using FIA plot measurements, LiDAR data that are spatially aligned with a subset of the FIA plots, and complete coverage remotely detected data used to define landuse/landcover stratum and percent forest canopy cover. Results support a model-based approach to estimating forest parameters when inventory data are sparse or resources limit collection of enough data to achieve desired accuracy and precision using design-based methods. Supplementary materials accompanying this paper appear on-line

本研究开发并实施了一个两阶段分层贝叶斯模型，用于估算稀疏采样的激光雷达和地理参照森林资源调查小区的森林生物量密度和总量。美国农业部 (USDA) 林业局森林资源调查与分析 (FIA) 的目标是为阿拉斯加内陆偏远的塔纳纳调查单元 (TIU) 提供生物量估算，而该模型正是基于此目标而开发的。建议的模型可对任意大小的区域进行分层生物量估算。基于模型的估算值与 TIU FIA 设计的分层后估算值进行了比较。对 TIU 内的两片实验林进行了基于模型的小面积估算（SAE），并将其与利用密集的独立清查地块网络生成的每片林的基于设计的估算进行了比较。模型参数估计和生物量预测使用了森林资源评估地块测量数据、与森林资源评估地块子集在空间上一致的激光雷达数据，以及用于定义土地利用/土地覆盖层和森林冠层覆盖率的完整覆盖遥感数据。研究结果支持采用基于模型的方法估算森林参数，当清查数据稀少或资源限制无法收集足够的数据时，采用基于设计的方法可达到理想的准确度和精确度。本文附带的补充材料可在线查阅

{"title":"Models to Support Forest Inventory and Small Area Estimation Using Sparsely Sampled LiDAR: A Case Study Involving G-LiHT LiDAR in Tanana, Alaska","authors":"Andrew O. Finley, Hans-Erik Andersen, Chad Babcock, Bruce D. Cook, Douglas C. Morton, Sudipto Banerjee","doi":"10.1007/s13253-024-00611-3","DOIUrl":"https://doi.org/10.1007/s13253-024-00611-3","url":null,"abstract":"A two-stage hierarchical Bayesian model is developed and implemented to estimate forest biomass density and total given sparsely sampled LiDAR and georeferenced forest inventory plot measurements. The model is motivated by the United States Department of Agriculture (USDA) Forest Service Forest Inventory and Analysis (FIA) objective to provide biomass estimates for the remote Tanana Inventory Unit (TIU) in interior Alaska. The proposed model yields stratum-level biomass estimates for arbitrarily sized areas. Model-based estimates are compared with the TIU FIA design-based post-stratified estimates. Model-based small area estimates (SAEs) for two experimental forests within the TIU are compared with each forest’s design-based estimates generated using a dense network of independent inventory plots. Model parameter estimates and biomass predictions are informed using FIA plot measurements, LiDAR data that are spatially aligned with a subset of the FIA plots, and complete coverage remotely detected data used to define landuse/landcover stratum and percent forest canopy cover. Results support a model-based approach to estimating forest parameters when inventory data are sparse or resources limit collection of enough data to achieve desired accuracy and precision using design-based methods. Supplementary materials accompanying this paper appear on-line","PeriodicalId":56336,"journal":{"name":"Journal of Agricultural Biological and Environmental Statistics","volume":"145 17 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140148621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Exploring the Efficacy of Statistical and Deep Learning Methods for Large Spatial Datasets: A Case Study 探索大型空间数据集的统计和深度学习方法的有效性：案例研究

IF 1.4 4区数学 Q3 BIOLOGY

Journal of Agricultural Biological and Environmental Statistics

Pub Date : 2024-02-08 DOI: 10.1007/s13253-024-00602-4

Arnab Hazra, Pratik Nag, Rishikesh Yadav, Ying Sun

Increasingly large and complex spatial datasets pose massive inferential challenges due to high computational and storage costs. Our study is motivated by the KAUST Competition on Large Spatial Datasets 2023, which tasked participants with estimating spatial covariance-related parameters and predicting values at testing sites, along with uncertainty estimates. We compared various statistical and deep learning approaches through cross-validation and ultimately selected the Vecchia approximation technique for model fitting. To overcome the constraints in the R package GpGp, which lacked support for fitting zero-mean Gaussian processes and direct uncertainty estimation—two things that are necessary for the competition, we developed additional R functions. Besides, we implemented certain subsampling-based approximations and parametric smoothing for skewed sampling distributions of the estimators. Our team DesiBoys secured the first position in two out of four sub-competitions and the second position in the other two, validating the effectiveness of our proposed strategies. Moreover, we extended our evaluation to a large real spatial satellite-derived dataset on total precipitable water, where we compared the predictive performances of different models using multiple diagnostics.

由于计算和存储成本高昂，日益庞大和复杂的空间数据集带来了巨大的推理挑战。我们的研究是受 KAUST 2023 年大型空间数据集竞赛的启发，该竞赛要求参赛者估算空间协方差相关参数并预测测试点的值以及不确定性估计值。我们通过交叉验证比较了各种统计和深度学习方法，最终选择了 Vecchia 近似技术进行模型拟合。R 软件包 GpGp 缺乏对零均值高斯过程拟合和直接不确定性估计的支持--而这两点正是比赛所必需的，为了克服这一限制，我们开发了额外的 R 函数。此外，我们还实现了某些基于子采样的近似和参数平滑，以处理估计器的倾斜采样分布。我们的团队 DesiBoys 在四项分赛中有两项获得第一名，另外两项获得第二名，这验证了我们提出的策略的有效性。此外，我们还将评估扩展到了一个大型真实空间卫星可降水总量数据集，并在此基础上使用多种诊断方法比较了不同模型的预测性能。

{"title":"Exploring the Efficacy of Statistical and Deep Learning Methods for Large Spatial Datasets: A Case Study","authors":"Arnab Hazra, Pratik Nag, Rishikesh Yadav, Ying Sun","doi":"10.1007/s13253-024-00602-4","DOIUrl":"https://doi.org/10.1007/s13253-024-00602-4","url":null,"abstract":"Increasingly large and complex spatial datasets pose massive inferential challenges due to high computational and storage costs. Our study is motivated by the KAUST Competition on Large Spatial Datasets 2023, which tasked participants with estimating spatial covariance-related parameters and predicting values at testing sites, along with uncertainty estimates. We compared various statistical and deep learning approaches through cross-validation and ultimately selected the Vecchia approximation technique for model fitting. To overcome the constraints in the R package GpGp, which lacked support for fitting zero-mean Gaussian processes and direct uncertainty estimation—two things that are necessary for the competition, we developed additional R functions. Besides, we implemented certain subsampling-based approximations and parametric smoothing for skewed sampling distributions of the estimators. Our team DesiBoys secured the first position in two out of four sub-competitions and the second position in the other two, validating the effectiveness of our proposed strategies. Moreover, we extended our evaluation to a large real spatial satellite-derived dataset on total precipitable water, where we compared the predictive performances of different models using multiple diagnostics.","PeriodicalId":56336,"journal":{"name":"Journal of Agricultural Biological and Environmental Statistics","volume":"527 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139759689","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Two Tests of Significance for Preferred Direction in Tree Radial Growth Under a Linear-Circular Regression Model with Correlated Random Errors 在具有相关随机误差的线性-圆形回归模型下，对树木径向生长首选方向的两个显著性检验

IF 1.4 4区数学 Q3 BIOLOGY

Journal of Agricultural Biological and Environmental Statistics

Pub Date : 2024-01-31 DOI: 10.1007/s13253-023-00599-2

Pierre Dutilleul, Tomoaki Imoto, Kunio Shimizu

To analyze tree growth statistically through annual ring widths measured in 2-D horizontal trunk sections, we propose two tests of significance defined under a linear-circular regression model with fixed trigonometric effects and normal random errors with a variance-covariance structure from the symmetric circulant family. The associated von Mises distribution has a preferred direction parameter. Accordingly, the first test aims to assess the presence of a preferred direction in the radial growth of a tree from the center of its trunk in a given year. Assuming there is a preferred direction of radial growth for the tree in two years, the second test extends the first one by assessing the equality of tree radial growth in the two preferred directions. Both tests of significance are modified F-tests with the denominator df adjusted for the presence of autocorrelation. Their validity is analyzed for two autoregressive symmetric circulant correlation structures, as a function of the number (n) of angular data and the autocorrelation parameter value. Effects of the inter-year correlation coefficient value are also studied in the two-year case. The performance of REstricted Maximum Likelihood as estimation method is scrutinized in an extensive Monte Carlo study, and the power of the tests is analyzed when valid. The new testing procedures are applied with (n = 32, 64) ring widths per year for a white spruce tree during 18 years of growth until its harvest. R codes are available. Conclusions and perspectives for future research are given. Supplementary materials accompanying this paper appear on-line.

为了通过二维水平树干截面测量的年轮宽度对树木生长进行统计分析，我们提出了两种显著性检验方法，其定义条件是线性圆回归模型具有固定的三角效应和正态随机误差，其方差-协方差结构属于对称环状族。相关的 von Mises 分布有一个优先方向参数。因此，第一个测试的目的是评估树木在某一年从树干中心开始的径向生长是否存在首选方向。假定树木在两年中的径向生长有一个首选方向，第二个检验扩展了第一个检验，评估树木在两个首选方向上的径向生长是否相等。这两个显著性检验都是修正的 F 检验，分母 df 根据自相关的存在进行了调整。针对两种自回归对称环状相关结构，分析了它们的有效性，作为角度数据数量（n）和自相关参数值的函数。在两年的情况下，还研究了年际相关系数值的影响。在广泛的蒙特卡罗研究中，对作为估计方法的限制最大似然法的性能进行了仔细检查，并分析了有效时的检验功率。新的测试程序在一棵白云杉 18 年的生长直至采伐期间，每年的环宽为（n = 32，64）。提供了 R 代码。文中给出了结论和对未来研究的展望。本文所附的补充材料可在线查阅。

{"title":"Two Tests of Significance for Preferred Direction in Tree Radial Growth Under a Linear-Circular Regression Model with Correlated Random Errors","authors":"Pierre Dutilleul, Tomoaki Imoto, Kunio Shimizu","doi":"10.1007/s13253-023-00599-2","DOIUrl":"https://doi.org/10.1007/s13253-023-00599-2","url":null,"abstract":"To analyze tree growth statistically through annual ring widths measured in 2-D horizontal trunk sections, we propose two tests of significance defined under a linear-circular regression model with fixed trigonometric effects and normal random errors with a variance-covariance structure from the symmetric circulant family. The associated von Mises distribution has a preferred direction parameter. Accordingly, the first test aims to assess the presence of a preferred direction in the radial growth of a tree from the center of its trunk in a given year. Assuming there is a preferred direction of radial growth for the tree in two years, the second test extends the first one by assessing the equality of tree radial growth in the two preferred directions. Both tests of significance are modified F-tests with the denominator df adjusted for the presence of autocorrelation. Their validity is analyzed for two autoregressive symmetric circulant correlation structures, as a function of the number (n) of angular data and the autocorrelation parameter value. Effects of the inter-year correlation coefficient value are also studied in the two-year case. The performance of REstricted Maximum Likelihood as estimation method is scrutinized in an extensive Monte Carlo study, and the power of the tests is analyzed when valid. The new testing procedures are applied with (n = 32, 64) ring widths per year for a white spruce tree during 18 years of growth until its harvest. R codes are available. Conclusions and perspectives for future research are given. Supplementary materials accompanying this paper appear on-line.","PeriodicalId":56336,"journal":{"name":"Journal of Agricultural Biological and Environmental Statistics","volume":"231 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139656375","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Spatial Mixture Model for Spaceborne Lidar Observations Over Mixed Forest and Non-forest Land Types 混交林和非林地类型上空空载激光雷达观测的空间混合模型

IF 1.4 4区数学 Q3 BIOLOGY

Journal of Agricultural Biological and Environmental Statistics

Pub Date : 2024-01-30 DOI: 10.1007/s13253-024-00600-6

Paul B. May, Andrew O. Finley, Ralph O. Dubayah

The Global Ecosystem Dynamics Investigation (GEDI) is a spaceborne lidar instrument that collects near-global measurements of forest structure. While expansive in scope, GEDI samples are spatially sparse and cover a small fraction of the land surface. Converting the sparse samples into spatially complete predictive maps is of practical importance for a number of ecological studies. A complicating factor is that GEDI collects measurements over forested and non-forested land alike, with no automatic labeling of the land type. Such classification is important, as it categorically influences the probability distribution of the spatial process and the ecological interpretation of the observations/predictions. We propose and implement a spatial mixture model, separating the observations and the greater spatial domain into two latent classes. The latent classes are governed by a Bernoulli spatial process, with spatial effects driven by a Gaussian process. Within each class, the process is governed by a separate spatial model, describing the unique probabilistic attributes. Model predictions take the form of scalar predictions of the GEDI observables as well as discrete labeling of the class membership. Inference is conducted through a Bayesian paradigm, yielding rich quantification of prediction and uncertainty through posterior predictive distributions. We demonstrate the method using GEDI data over Wollemi National Park, Australia, using optical data from Landsat 8 as model covariates. When compared to a single spatial model, the mixture model achieves much higher posterior predictive densities on the true value. When compared to a random forest model, a common algorithmic approach in the remote sensing community, the random forest achieves better absolute prediction accuracy for prediction locations far from observed training data locations, but at the expense of location-specific assessments of uncertainty. The unsupervised binary classifications of the mixture model appear broadly ecologically interpretable as forest and non-forest when compared to optical imagery, but further comparison to ground-truth data is required.

全球生态系统动态调查（GEDI）是一种空间激光雷达仪器，用于收集近全球范围内的森林结构测量数据。GEDI 的样本虽然范围广泛，但空间稀疏，只覆盖陆地表面的一小部分。将稀疏的样本转换成空间上完整的预测地图对许多生态研究具有实际意义。一个复杂的因素是，GEDI 对林地和非林地都进行了测量，但没有自动标记土地类型。这种分类非常重要，因为它会对空间过程的概率分布和观测/预测的生态解释产生分类影响。我们提出并实施了一种空间混合模型，将观测数据和更大的空间领域分为两个潜在类别。潜类由伯努利空间过程控制，空间效应由高斯过程驱动。在每个类别中，该过程由单独的空间模型控制，描述独特的概率属性。模型预测的形式包括对 GEDI 可观测变量的标量预测以及对类别成员资格的离散标记。推理通过贝叶斯模式进行，通过后验预测分布对预测和不确定性进行丰富的量化。我们使用澳大利亚沃勒米国家公园的 GEDI 数据演示了该方法，并将 Landsat 8 的光学数据作为模型协变量。与单一空间模型相比，混合模型对真实值的后验预测密度要高得多。随机森林模型是遥感界常用的算法方法，与随机森林模型相比，随机森林模型对远离观测训练数据位置的预测位置的绝对预测精度更高，但却牺牲了对特定位置的不确定性评估。与光学图像相比，混合模型的无监督二元分类在生态学上可大致解释为森林和非森林，但还需要与地面实况数据作进一步比较。

{"title":"A Spatial Mixture Model for Spaceborne Lidar Observations Over Mixed Forest and Non-forest Land Types","authors":"Paul B. May, Andrew O. Finley, Ralph O. Dubayah","doi":"10.1007/s13253-024-00600-6","DOIUrl":"https://doi.org/10.1007/s13253-024-00600-6","url":null,"abstract":"The Global Ecosystem Dynamics Investigation (GEDI) is a spaceborne lidar instrument that collects near-global measurements of forest structure. While expansive in scope, GEDI samples are spatially sparse and cover a small fraction of the land surface. Converting the sparse samples into spatially complete predictive maps is of practical importance for a number of ecological studies. A complicating factor is that GEDI collects measurements over forested and non-forested land alike, with no automatic labeling of the land type. Such classification is important, as it categorically influences the probability distribution of the spatial process and the ecological interpretation of the observations/predictions. We propose and implement a spatial mixture model, separating the observations and the greater spatial domain into two latent classes. The latent classes are governed by a Bernoulli spatial process, with spatial effects driven by a Gaussian process. Within each class, the process is governed by a separate spatial model, describing the unique probabilistic attributes. Model predictions take the form of scalar predictions of the GEDI observables as well as discrete labeling of the class membership. Inference is conducted through a Bayesian paradigm, yielding rich quantification of prediction and uncertainty through posterior predictive distributions. We demonstrate the method using GEDI data over Wollemi National Park, Australia, using optical data from Landsat 8 as model covariates. When compared to a single spatial model, the mixture model achieves much higher posterior predictive densities on the true value. When compared to a random forest model, a common algorithmic approach in the remote sensing community, the random forest achieves better absolute prediction accuracy for prediction locations far from observed training data locations, but at the expense of location-specific assessments of uncertainty. The unsupervised binary classifications of the mixture model appear broadly ecologically interpretable as forest and non-forest when compared to optical imagery, but further comparison to ground-truth data is required.","PeriodicalId":56336,"journal":{"name":"Journal of Agricultural Biological and Environmental Statistics","volume":"35 1","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139649379","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0