Pub Date : 2024-05-11DOI: 10.1007/s13253-024-00626-w
Alec B. M. Van Helsdingen, Tiago A. Marques, Charlotte M. Jones-Todd
A Hawkes point process describes self-exciting behaviour where event arrivals are triggered by historic events. These models are increasingly becoming a popular choice in analysing event-type data. Like all other inhomogeneous Poisson point processes, the waiting time between events in a Hawkes process is derived from an exponential distribution with mean one. However, as with many ecological and environmental data, this is an unrealistic assumption. We, therefore, extend and generalise the Hawkes process to account for potential under- or overdispersion in the waiting times between events by assuming the Weibull distribution as the foundation of the waiting times. We apply this model to the acoustic cue production times of sperm whales and show that our Weibull–Hawkes model better captures the inherent underdispersion in the interarrival times of echolocation clicks emitted by these whales.
{"title":"An Inhomogeneous Weibull–Hawkes Process to Model Underdispersed Acoustic Cues","authors":"Alec B. M. Van Helsdingen, Tiago A. Marques, Charlotte M. Jones-Todd","doi":"10.1007/s13253-024-00626-w","DOIUrl":"https://doi.org/10.1007/s13253-024-00626-w","url":null,"abstract":"<p>A Hawkes point process describes self-exciting behaviour where event arrivals are triggered by historic events. These models are increasingly becoming a popular choice in analysing event-type data. Like all other inhomogeneous Poisson point processes, the waiting time between events in a Hawkes process is derived from an exponential distribution with mean one. However, as with many ecological and environmental data, this is an unrealistic assumption. We, therefore, extend and generalise the Hawkes process to account for potential under- or overdispersion in the waiting times between events by assuming the Weibull distribution as the foundation of the waiting times. We apply this model to the acoustic cue production times of sperm whales and show that our Weibull–Hawkes model better captures the inherent underdispersion in the interarrival times of echolocation clicks emitted by these whales.</p>","PeriodicalId":56336,"journal":{"name":"Journal of Agricultural Biological and Environmental Statistics","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140931483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-29DOI: 10.1007/s13253-024-00619-9
Ben Seiyon Lee, Murali Haran
Spatially correlated data with an excess of zeros, usually referred to as zero-inflated spatial data, arise in many disciplines. Examples include count data, for instance, abundance (or lack thereof) of animal species and disease counts, as well as semi-continuous data like observed precipitation. Spatial two-part models are a flexible class of models for such data. Fitting two-part models can be computationally expensive for large data due to high-dimensional dependent latent variables, costly matrix operations, and slow mixing Markov chains. We describe a flexible, computationally efficient approach for modeling large zero-inflated spatial data using the projection-based intrinsic conditional autoregression (PICAR) framework. We study our approach, which we call PICAR-Z, through extensive simulation studies and two environmental data sets. Our results suggest that PICAR-Z provides accurate predictions while remaining computationally efficient. An important goal of our work is to allow researchers who are not experts in computation to easily build computationally efficient extensions to zero-inflated spatial models; this also allows for a more thorough exploration of modeling choices in two-part models than was previously possible. We show that PICAR-Z is easy to implement and extend in popular probabilistic programming languages such as nimble and stan.
{"title":"A class of models for large zero-inflated spatial data","authors":"Ben Seiyon Lee, Murali Haran","doi":"10.1007/s13253-024-00619-9","DOIUrl":"https://doi.org/10.1007/s13253-024-00619-9","url":null,"abstract":"<p>Spatially correlated data with an excess of zeros, usually referred to as zero-inflated spatial data, arise in many disciplines. Examples include count data, for instance, abundance (or lack thereof) of animal species and disease counts, as well as semi-continuous data like observed precipitation. Spatial two-part models are a flexible class of models for such data. Fitting two-part models can be computationally expensive for large data due to high-dimensional dependent latent variables, costly matrix operations, and slow mixing Markov chains. We describe a flexible, computationally efficient approach for modeling large zero-inflated spatial data using the projection-based intrinsic conditional autoregression (PICAR) framework. We study our approach, which we call PICAR-Z, through extensive simulation studies and two environmental data sets. Our results suggest that PICAR-Z provides accurate predictions while remaining computationally efficient. An important goal of our work is to allow researchers who are not experts in computation to easily build computationally efficient extensions to zero-inflated spatial models; this also allows for a more thorough exploration of modeling choices in two-part models than was previously possible. We show that PICAR-Z is easy to implement and extend in popular probabilistic programming languages such as <span>nimble</span> and <span>stan</span>.</p>","PeriodicalId":56336,"journal":{"name":"Journal of Agricultural Biological and Environmental Statistics","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140808823","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-28DOI: 10.1007/s13253-024-00624-y
Dhanushi A. Wijeyakulasuriya, Ephraim M. Hanks, Benjamin A. Shaby
Humans have recorded the arrival dates of migratory birds for millennia, searching for trends and patterns. As the first arrival among individuals in a species is the realized tail of the probability distribution of arrivals, the appropriate statistical framework with which to analyze such events is extreme value theory. Here, for the first time, we apply formal extreme value techniques to the dynamics of bird migrations. We study the annual first arrivals of Magnolia Warblers using modern tools from the statistical field of extreme value analysis. Using observations from the eBird database, we model the spatial distribution of observed Magnolia Warbler arrivals as a max-infinitely divisible process, which allows us to spatially interpolate observed annual arrivals in a probabilistically coherent way and to project arrival dynamics into the future by conditioning on climatic variables. Supplementary materials accompanying this paper appear online.
{"title":"Modeling First Arrival of Migratory Birds Using a Hierarchical Max-Infinitely Divisible Process","authors":"Dhanushi A. Wijeyakulasuriya, Ephraim M. Hanks, Benjamin A. Shaby","doi":"10.1007/s13253-024-00624-y","DOIUrl":"https://doi.org/10.1007/s13253-024-00624-y","url":null,"abstract":"<p>Humans have recorded the arrival dates of migratory birds for millennia, searching for trends and patterns. As the first arrival among individuals in a species is the realized tail of the probability distribution of arrivals, the appropriate statistical framework with which to analyze such events is extreme value theory. Here, for the first time, we apply formal extreme value techniques to the dynamics of bird migrations. We study the annual first arrivals of Magnolia Warblers using modern tools from the statistical field of extreme value analysis. Using observations from the eBird database, we model the spatial distribution of observed Magnolia Warbler arrivals as a max-infinitely divisible process, which allows us to spatially interpolate observed annual arrivals in a probabilistically coherent way and to project arrival dynamics into the future by conditioning on climatic variables. Supplementary materials accompanying this paper appear online.</p>","PeriodicalId":56336,"journal":{"name":"Journal of Agricultural Biological and Environmental Statistics","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140808825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-01DOI: 10.1007/s13253-024-00616-y
Xinyi Lu, Yoichiro Kanno, George P. Valentine, Matt A. Kulp, Mevin B. Hooten
Climate change impacts ecosystems variably in space and time. Landscape features may confer resistance against environmental stressors, whose intensity and frequency also depend on local weather patterns. Characterizing spatio-temporal variation in population responses to these stressors improves our understanding of what constitutes climate change refugia. We developed a Bayesian hierarchical framework that allowed us to differentiate population responses to seasonal weather patterns depending on their “sensitive” or “resilient” states. The framework inferred these sensitivity states based on latent trajectories delineating dynamic state probabilities. The latent trajectories are composed of linear initial conditions, functional regression models, and additive random effects representing ecological mechanisms such as topological buffering and effects of legacy weather conditions. Further, we developed a Bayesian regularization strategy that promoted temporal coherence in the inferred states. We demonstrated our hierarchical framework and regularization strategy using simulated examples and a case study of native brook trout (Salvelinus fontinalis) count data from the Great Smoky Mountains National Park, southeastern USA. Our study provided insights into ecological processes influencing brook trout sensitivity. Our framework can also be applied to other species and ecosystems to facilitate management and conservation.
{"title":"Regularized Latent Trajectory Models for Spatio-temporal Population Dynamics","authors":"Xinyi Lu, Yoichiro Kanno, George P. Valentine, Matt A. Kulp, Mevin B. Hooten","doi":"10.1007/s13253-024-00616-y","DOIUrl":"https://doi.org/10.1007/s13253-024-00616-y","url":null,"abstract":"<p>Climate change impacts ecosystems variably in space and time. Landscape features may confer resistance against environmental stressors, whose intensity and frequency also depend on local weather patterns. Characterizing spatio-temporal variation in population responses to these stressors improves our understanding of what constitutes climate change refugia. We developed a Bayesian hierarchical framework that allowed us to differentiate population responses to seasonal weather patterns depending on their “sensitive” or “resilient” states. The framework inferred these sensitivity states based on latent trajectories delineating dynamic state probabilities. The latent trajectories are composed of linear initial conditions, functional regression models, and additive random effects representing ecological mechanisms such as topological buffering and effects of legacy weather conditions. Further, we developed a Bayesian regularization strategy that promoted temporal coherence in the inferred states. We demonstrated our hierarchical framework and regularization strategy using simulated examples and a case study of native brook trout (<i>Salvelinus fontinalis</i>) count data from the Great Smoky Mountains National Park, southeastern USA. Our study provided insights into ecological processes influencing brook trout sensitivity. Our framework can also be applied to other species and ecosystems to facilitate management and conservation.</p>","PeriodicalId":56336,"journal":{"name":"Journal of Agricultural Biological and Environmental Statistics","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140596104","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-13DOI: 10.1007/s13253-024-00611-3
Andrew O. Finley, Hans-Erik Andersen, Chad Babcock, Bruce D. Cook, Douglas C. Morton, Sudipto Banerjee
A two-stage hierarchical Bayesian model is developed and implemented to estimate forest biomass density and total given sparsely sampled LiDAR and georeferenced forest inventory plot measurements. The model is motivated by the United States Department of Agriculture (USDA) Forest Service Forest Inventory and Analysis (FIA) objective to provide biomass estimates for the remote Tanana Inventory Unit (TIU) in interior Alaska. The proposed model yields stratum-level biomass estimates for arbitrarily sized areas. Model-based estimates are compared with the TIU FIA design-based post-stratified estimates. Model-based small area estimates (SAEs) for two experimental forests within the TIU are compared with each forest’s design-based estimates generated using a dense network of independent inventory plots. Model parameter estimates and biomass predictions are informed using FIA plot measurements, LiDAR data that are spatially aligned with a subset of the FIA plots, and complete coverage remotely detected data used to define landuse/landcover stratum and percent forest canopy cover. Results support a model-based approach to estimating forest parameters when inventory data are sparse or resources limit collection of enough data to achieve desired accuracy and precision using design-based methods. Supplementary materials accompanying this paper appear on-line
本研究开发并实施了一个两阶段分层贝叶斯模型,用于估算稀疏采样的激光雷达和地理参照森林资源调查小区的森林生物量密度和总量。美国农业部 (USDA) 林业局森林资源调查与分析 (FIA) 的目标是为阿拉斯加内陆偏远的塔纳纳调查单元 (TIU) 提供生物量估算,而该模型正是基于此目标而开发的。建议的模型可对任意大小的区域进行分层生物量估算。基于模型的估算值与 TIU FIA 设计的分层后估算值进行了比较。对 TIU 内的两片实验林进行了基于模型的小面积估算(SAE),并将其与利用密集的独立清查地块网络生成的每片林的基于设计的估算进行了比较。模型参数估计和生物量预测使用了森林资源评估地块测量数据、与森林资源评估地块子集在空间上一致的激光雷达数据,以及用于定义土地利用/土地覆盖层和森林冠层覆盖率的完整覆盖遥感数据。研究结果支持采用基于模型的方法估算森林参数,当清查数据稀少或资源限制无法收集足够的数据时,采用基于设计的方法可达到理想的准确度和精确度。本文附带的补充材料可在线查阅
{"title":"Models to Support Forest Inventory and Small Area Estimation Using Sparsely Sampled LiDAR: A Case Study Involving G-LiHT LiDAR in Tanana, Alaska","authors":"Andrew O. Finley, Hans-Erik Andersen, Chad Babcock, Bruce D. Cook, Douglas C. Morton, Sudipto Banerjee","doi":"10.1007/s13253-024-00611-3","DOIUrl":"https://doi.org/10.1007/s13253-024-00611-3","url":null,"abstract":"<p>A two-stage hierarchical Bayesian model is developed and implemented to estimate forest biomass density and total given sparsely sampled LiDAR and georeferenced forest inventory plot measurements. The model is motivated by the United States Department of Agriculture (USDA) Forest Service Forest Inventory and Analysis (FIA) objective to provide biomass estimates for the remote Tanana Inventory Unit (TIU) in interior Alaska. The proposed model yields stratum-level biomass estimates for arbitrarily sized areas. Model-based estimates are compared with the TIU FIA design-based post-stratified estimates. Model-based small area estimates (SAEs) for two experimental forests within the TIU are compared with each forest’s design-based estimates generated using a dense network of independent inventory plots. Model parameter estimates and biomass predictions are informed using FIA plot measurements, LiDAR data that are spatially aligned with a subset of the FIA plots, and complete coverage remotely detected data used to define landuse/landcover stratum and percent forest canopy cover. Results support a model-based approach to estimating forest parameters when inventory data are sparse or resources limit collection of enough data to achieve desired accuracy and precision using design-based methods. Supplementary materials accompanying this paper appear on-line</p>","PeriodicalId":56336,"journal":{"name":"Journal of Agricultural Biological and Environmental Statistics","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140148621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-02-08DOI: 10.1007/s13253-024-00602-4
Arnab Hazra, Pratik Nag, Rishikesh Yadav, Ying Sun
Increasingly large and complex spatial datasets pose massive inferential challenges due to high computational and storage costs. Our study is motivated by the KAUST Competition on Large Spatial Datasets 2023, which tasked participants with estimating spatial covariance-related parameters and predicting values at testing sites, along with uncertainty estimates. We compared various statistical and deep learning approaches through cross-validation and ultimately selected the Vecchia approximation technique for model fitting. To overcome the constraints in the R package GpGp, which lacked support for fitting zero-mean Gaussian processes and direct uncertainty estimation—two things that are necessary for the competition, we developed additional R functions. Besides, we implemented certain subsampling-based approximations and parametric smoothing for skewed sampling distributions of the estimators. Our team DesiBoys secured the first position in two out of four sub-competitions and the second position in the other two, validating the effectiveness of our proposed strategies. Moreover, we extended our evaluation to a large real spatial satellite-derived dataset on total precipitable water, where we compared the predictive performances of different models using multiple diagnostics.
{"title":"Exploring the Efficacy of Statistical and Deep Learning Methods for Large Spatial Datasets: A Case Study","authors":"Arnab Hazra, Pratik Nag, Rishikesh Yadav, Ying Sun","doi":"10.1007/s13253-024-00602-4","DOIUrl":"https://doi.org/10.1007/s13253-024-00602-4","url":null,"abstract":"<p>Increasingly large and complex spatial datasets pose massive inferential challenges due to high computational and storage costs. Our study is motivated by the KAUST Competition on Large Spatial Datasets 2023, which tasked participants with estimating spatial covariance-related parameters and predicting values at testing sites, along with uncertainty estimates. We compared various statistical and deep learning approaches through cross-validation and ultimately selected the Vecchia approximation technique for model fitting. To overcome the constraints in the <span>R</span> package <span>GpGp</span>, which lacked support for fitting zero-mean Gaussian processes and direct uncertainty estimation—two things that are necessary for the competition, we developed additional <span>R</span> functions. Besides, we implemented certain subsampling-based approximations and parametric smoothing for skewed sampling distributions of the estimators. Our team DesiBoys secured the first position in two out of four sub-competitions and the second position in the other two, validating the effectiveness of our proposed strategies. Moreover, we extended our evaluation to a large real spatial satellite-derived dataset on total precipitable water, where we compared the predictive performances of different models using multiple diagnostics.</p>","PeriodicalId":56336,"journal":{"name":"Journal of Agricultural Biological and Environmental Statistics","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139759689","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-31DOI: 10.1007/s13253-023-00599-2
Pierre Dutilleul, Tomoaki Imoto, Kunio Shimizu
To analyze tree growth statistically through annual ring widths measured in 2-D horizontal trunk sections, we propose two tests of significance defined under a linear-circular regression model with fixed trigonometric effects and normal random errors with a variance-covariance structure from the symmetric circulant family. The associated von Mises distribution has a preferred direction parameter. Accordingly, the first test aims to assess the presence of a preferred direction in the radial growth of a tree from the center of its trunk in a given year. Assuming there is a preferred direction of radial growth for the tree in two years, the second test extends the first one by assessing the equality of tree radial growth in the two preferred directions. Both tests of significance are modified F-tests with the denominator df adjusted for the presence of autocorrelation. Their validity is analyzed for two autoregressive symmetric circulant correlation structures, as a function of the number (n) of angular data and the autocorrelation parameter value. Effects of the inter-year correlation coefficient value are also studied in the two-year case. The performance of REstricted Maximum Likelihood as estimation method is scrutinized in an extensive Monte Carlo study, and the power of the tests is analyzed when valid. The new testing procedures are applied with (n = 32, 64) ring widths per year for a white spruce tree during 18 years of growth until its harvest. R codes are available. Conclusions and perspectives for future research are given. Supplementary materials accompanying this paper appear on-line.
为了通过二维水平树干截面测量的年轮宽度对树木生长进行统计分析,我们提出了两种显著性检验方法,其定义条件是线性圆回归模型具有固定的三角效应和正态随机误差,其方差-协方差结构属于对称环状族。相关的 von Mises 分布有一个优先方向参数。因此,第一个测试的目的是评估树木在某一年从树干中心开始的径向生长是否存在首选方向。假定树木在两年中的径向生长有一个首选方向,第二个检验扩展了第一个检验,评估树木在两个首选方向上的径向生长是否相等。这两个显著性检验都是修正的 F 检验,分母 df 根据自相关的存在进行了调整。针对两种自回归对称环状相关结构,分析了它们的有效性,作为角度数据数量(n)和自相关参数值的函数。在两年的情况下,还研究了年际相关系数值的影响。在广泛的蒙特卡罗研究中,对作为估计方法的限制最大似然法的性能进行了仔细检查,并分析了有效时的检验功率。新的测试程序在一棵白云杉 18 年的生长直至采伐期间,每年的环宽为(n = 32,64)。提供了 R 代码。文中给出了结论和对未来研究的展望。本文所附的补充材料可在线查阅。
{"title":"Two Tests of Significance for Preferred Direction in Tree Radial Growth Under a Linear-Circular Regression Model with Correlated Random Errors","authors":"Pierre Dutilleul, Tomoaki Imoto, Kunio Shimizu","doi":"10.1007/s13253-023-00599-2","DOIUrl":"https://doi.org/10.1007/s13253-023-00599-2","url":null,"abstract":"<p>To analyze tree growth statistically through annual ring widths measured in 2-D horizontal trunk sections, we propose two tests of significance defined under a linear-circular regression model with fixed trigonometric effects and normal random errors with a variance-covariance structure from the symmetric circulant family. The associated von Mises distribution has a preferred direction parameter. Accordingly, the first test aims to assess the presence of a preferred direction in the radial growth of a tree from the center of its trunk in a given year. Assuming there is a preferred direction of radial growth for the tree in two years, the second test extends the first one by assessing the equality of tree radial growth in the two preferred directions. Both tests of significance are modified <i>F</i>-tests with the denominator <i>df</i> adjusted for the presence of autocorrelation. Their validity is analyzed for two autoregressive symmetric circulant correlation structures, as a function of the number (<i>n</i>) of angular data and the autocorrelation parameter value. Effects of the inter-year correlation coefficient value are also studied in the two-year case. The performance of REstricted Maximum Likelihood as estimation method is scrutinized in an extensive Monte Carlo study, and the power of the tests is analyzed when valid. The new testing procedures are applied with <span>(n = 32, 64)</span> ring widths per year for a white spruce tree during 18 years of growth until its harvest. R codes are available. Conclusions and perspectives for future research are given. Supplementary materials accompanying this paper appear on-line.</p>","PeriodicalId":56336,"journal":{"name":"Journal of Agricultural Biological and Environmental Statistics","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139656375","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-30DOI: 10.1007/s13253-024-00600-6
Paul B. May, Andrew O. Finley, Ralph O. Dubayah
The Global Ecosystem Dynamics Investigation (GEDI) is a spaceborne lidar instrument that collects near-global measurements of forest structure. While expansive in scope, GEDI samples are spatially sparse and cover a small fraction of the land surface. Converting the sparse samples into spatially complete predictive maps is of practical importance for a number of ecological studies. A complicating factor is that GEDI collects measurements over forested and non-forested land alike, with no automatic labeling of the land type. Such classification is important, as it categorically influences the probability distribution of the spatial process and the ecological interpretation of the observations/predictions. We propose and implement a spatial mixture model, separating the observations and the greater spatial domain into two latent classes. The latent classes are governed by a Bernoulli spatial process, with spatial effects driven by a Gaussian process. Within each class, the process is governed by a separate spatial model, describing the unique probabilistic attributes. Model predictions take the form of scalar predictions of the GEDI observables as well as discrete labeling of the class membership. Inference is conducted through a Bayesian paradigm, yielding rich quantification of prediction and uncertainty through posterior predictive distributions. We demonstrate the method using GEDI data over Wollemi National Park, Australia, using optical data from Landsat 8 as model covariates. When compared to a single spatial model, the mixture model achieves much higher posterior predictive densities on the true value. When compared to a random forest model, a common algorithmic approach in the remote sensing community, the random forest achieves better absolute prediction accuracy for prediction locations far from observed training data locations, but at the expense of location-specific assessments of uncertainty. The unsupervised binary classifications of the mixture model appear broadly ecologically interpretable as forest and non-forest when compared to optical imagery, but further comparison to ground-truth data is required.
{"title":"A Spatial Mixture Model for Spaceborne Lidar Observations Over Mixed Forest and Non-forest Land Types","authors":"Paul B. May, Andrew O. Finley, Ralph O. Dubayah","doi":"10.1007/s13253-024-00600-6","DOIUrl":"https://doi.org/10.1007/s13253-024-00600-6","url":null,"abstract":"<p>The Global Ecosystem Dynamics Investigation (GEDI) is a spaceborne lidar instrument that collects near-global measurements of forest structure. While expansive in scope, GEDI samples are spatially sparse and cover a small fraction of the land surface. Converting the sparse samples into spatially complete predictive maps is of practical importance for a number of ecological studies. A complicating factor is that GEDI collects measurements over forested and non-forested land alike, with no automatic labeling of the land type. Such classification is important, as it categorically influences the probability distribution of the spatial process and the ecological interpretation of the observations/predictions. We propose and implement a spatial mixture model, separating the observations and the greater spatial domain into two latent classes. The latent classes are governed by a Bernoulli spatial process, with spatial effects driven by a Gaussian process. Within each class, the process is governed by a separate spatial model, describing the unique probabilistic attributes. Model predictions take the form of scalar predictions of the GEDI observables as well as discrete labeling of the class membership. Inference is conducted through a Bayesian paradigm, yielding rich quantification of prediction and uncertainty through posterior predictive distributions. We demonstrate the method using GEDI data over Wollemi National Park, Australia, using optical data from Landsat 8 as model covariates. When compared to a single spatial model, the mixture model achieves much higher posterior predictive densities on the true value. When compared to a random forest model, a common algorithmic approach in the remote sensing community, the random forest achieves better absolute prediction accuracy for prediction locations far from observed training data locations, but at the expense of location-specific assessments of uncertainty. The unsupervised binary classifications of the mixture model appear broadly ecologically interpretable as forest and non-forest when compared to optical imagery, but further comparison to ground-truth data is required.</p>","PeriodicalId":56336,"journal":{"name":"Journal of Agricultural Biological and Environmental Statistics","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139649379","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-30DOI: 10.1007/s13253-023-00597-4
Abstract
Spherical embedding is an important tool in several fields of data analysis, including environmental data, spatial statistics, text mining, gene expression analysis, medical research and, in general, areas in which the geodesic distance is a relevant factor. Many data acquisition technologies are related to massive data acquisition, and these high-dimensional vectors are often normalised and transformed into spherical data. In this representation of data on spherical surfaces, multidimensional scaling plays an important role. Traditionally, the methods of clustering and representation have been combined, since the precision of the representation tends to decrease when a large number of objects are involved, which makes interpretation difficult. In this paper, we present a model that partitions objects into classes while simultaneously representing the cluster centres on a spherical surface based on geodesic distances. The model combines a partition algorithm based on the approximation of dissimilarities to geodesic distances with a representation procedure for geodesic distances. In this process, the dissimilarities are transformed in order to optimise the radius of the sphere. The efficiency of the procedure described is analysed by means of an extensive Monte Carlo experiment, and its usefulness is illustrated for real data sets. Supplementary material to this paper is provided online.
{"title":"Clustering and Geodesic Scaling of Dissimilarities on the Spherical Surface","authors":"","doi":"10.1007/s13253-023-00597-4","DOIUrl":"https://doi.org/10.1007/s13253-023-00597-4","url":null,"abstract":"<h3>Abstract</h3> <p>Spherical embedding is an important tool in several fields of data analysis, including environmental data, spatial statistics, text mining, gene expression analysis, medical research and, in general, areas in which the geodesic distance is a relevant factor. Many data acquisition technologies are related to massive data acquisition, and these high-dimensional vectors are often normalised and transformed into spherical data. In this representation of data on spherical surfaces, multidimensional scaling plays an important role. Traditionally, the methods of clustering and representation have been combined, since the precision of the representation tends to decrease when a large number of objects are involved, which makes interpretation difficult. In this paper, we present a model that partitions objects into classes while simultaneously representing the cluster centres on a spherical surface based on geodesic distances. The model combines a partition algorithm based on the approximation of dissimilarities to geodesic distances with a representation procedure for geodesic distances. In this process, the dissimilarities are transformed in order to optimise the radius of the sphere. The efficiency of the procedure described is analysed by means of an extensive Monte Carlo experiment, and its usefulness is illustrated for real data sets. Supplementary material to this paper is provided online.</p>","PeriodicalId":56336,"journal":{"name":"Journal of Agricultural Biological and Environmental Statistics","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139649550","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-24DOI: 10.1007/s13253-023-00598-3
Riki Herliansyah, Ruth King, Dede Aulia Rahman, Stuart King
Obtaining abundance and density estimates is a particularly important aspect within wildlife conservation and management. To monitor wildlife populations, the use of motion-sensor camera traps is becoming increasing popular due to its non-invasive nature. However, animal identification is not always feasible in practice due to poor quality images and/or individuals not having uniquely identifiable physical characteristics. Spatially explicit models for unmarked individuals permit the estimation of animal density when individuals cannot be uniquely identified. Due to the structure of these models, a Bayesian super-population (data augmentation) approach is often used to fit the models to data, which involves specifying some reasonably large upper limit for the population. However, this approach presents substantial computational challenges for larger populations, as demonstrated by the motivating dataset relating to barking deer (Muntiacus muntjak) collected in Ujung Kulon National Park, Indonesia (with a population size in the low thousands). We develop a new and computationally efficient Bayesian algorithm for fitting the models to data that does not require specifying an upper population limit a priori. We apply the new algorithm to the large barking deer dataset, where the standard super-population approach is computationally expensive, and demonstrate a substantial improvement in computational efficiency.Supplementary material to this paper is provided online.
{"title":"Animal Density Estimation for Large Unmarked Populations Using a Spatially Explicit Model","authors":"Riki Herliansyah, Ruth King, Dede Aulia Rahman, Stuart King","doi":"10.1007/s13253-023-00598-3","DOIUrl":"https://doi.org/10.1007/s13253-023-00598-3","url":null,"abstract":"<p>Obtaining abundance and density estimates is a particularly important aspect within wildlife conservation and management. To monitor wildlife populations, the use of motion-sensor camera traps is becoming increasing popular due to its non-invasive nature. However, animal identification is not always feasible in practice due to poor quality images and/or individuals not having uniquely identifiable physical characteristics. Spatially explicit models for unmarked individuals permit the estimation of animal density when individuals cannot be uniquely identified. Due to the structure of these models, a Bayesian super-population (data augmentation) approach is often used to fit the models to data, which involves specifying some reasonably large upper limit for the population. However, this approach presents substantial computational challenges for larger populations, as demonstrated by the motivating dataset relating to barking deer (<i>Muntiacus muntjak</i>) collected in Ujung Kulon National Park, Indonesia (with a population size in the low thousands). We develop a new and computationally efficient Bayesian algorithm for fitting the models to data that does not require specifying an upper population limit <i>a priori</i>. We apply the new algorithm to the large barking deer dataset, where the standard super-population approach is computationally expensive, and demonstrate a substantial improvement in computational efficiency.Supplementary material to this paper is provided online.</p>","PeriodicalId":56336,"journal":{"name":"Journal of Agricultural Biological and Environmental Statistics","volume":null,"pages":null},"PeriodicalIF":1.4,"publicationDate":"2024-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139554890","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}