Georgia Lazoglou, Theo Economou, Christina Anagnostopoulou, Anna Tzyrkalli, George Zittis, Jos Lelieveld
Climate models are useful tools for analyzing historical and projecting future climate conditions. However, the model results tend to differ systematically from observations, particularly for parameters with complex spatial and temporal distributions such as precipitation. A combination of quantile mapping and generalized additive models (GAMs) is presented and proposed as a new method (Q-GAM) for the bias correction of daily precipitation. Q-GAM is demonstrated by using data from five European stations with different climate characteristics. For each station, the closest continental grid point of a EURO-CORDEX climate model was selected for bias correction. A bootstrapping experiment is presented with over 1000 repetitions of randomly splitting the historical period 1981 to 2005 into a calibration and evaluation period. The results for all stations reveal that Q-GAM is a straightforward, accurate and computationally efficient method for the bias correction of precipitation. In particular, the method improves the frequency of dry days and the total annual rainfall amount. This outcome is robust across stations with varying climate characteristics and also to the choice of calibration and evaluation periods. Similar results are also obtained for other precipitation characteristics such as the 0.9 and 0.95 quantiles.
{"title":"Bias correction of daily precipitation from climate models, using the Q-GAM method","authors":"Georgia Lazoglou, Theo Economou, Christina Anagnostopoulou, Anna Tzyrkalli, George Zittis, Jos Lelieveld","doi":"10.1002/env.2881","DOIUrl":"https://doi.org/10.1002/env.2881","url":null,"abstract":"<p>Climate models are useful tools for analyzing historical and projecting future climate conditions. However, the model results tend to differ systematically from observations, particularly for parameters with complex spatial and temporal distributions such as precipitation. A combination of quantile mapping and generalized additive models (GAMs) is presented and proposed as a new method (Q-GAM) for the bias correction of daily precipitation. Q-GAM is demonstrated by using data from five European stations with different climate characteristics. For each station, the closest continental grid point of a EURO-CORDEX climate model was selected for bias correction. A bootstrapping experiment is presented with over 1000 repetitions of randomly splitting the historical period 1981 to 2005 into a calibration and evaluation period. The results for all stations reveal that Q-GAM is a straightforward, accurate and computationally efficient method for the bias correction of precipitation. In particular, the method improves the frequency of dry days and the total annual rainfall amount. This outcome is robust across stations with varying climate characteristics and also to the choice of calibration and evaluation periods. Similar results are also obtained for other precipitation characteristics such as the 0.9 and 0.95 quantiles.</p>","PeriodicalId":50512,"journal":{"name":"Environmetrics","volume":"35 7","pages":""},"PeriodicalIF":1.5,"publicationDate":"2024-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/env.2881","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142430218","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Michael L. Pennell, Matthew W. Wheeler, Scott S. Auerbach
With the advent of new alternative methods for rapid toxicity screening of chemicals comes the need for new statistical methodologies which appropriately synthesize the large amount of data collected. For example, transcriptomic assays can be used to assess the impact of a chemical on thousands of genes, but current approaches to analyzing the data treat each gene separately and do not allow sharing of information among genes within pathways. Furthermore, the methods employed are fully parametric and do not account for changes in distribution shape that may occur at high exposure levels. To address the limitations of these methods, we propose Constrained Logistic Density Regression (COLDER) to model expression data from different genes simultaneously. Under COLDER, the dose-response function for each gene is assigned a prior via a discrete logistic stick-breaking process (LSBP) whose weights depend on gene-level characteristics (e.g., pathway membership) and atoms consist of different dose-response functions subject to a shape constraint that ensures biological plausibility. The posterior distribution for the benchmark dose among genes within the same pathways can be estimated directly from the model, which is another advantage over current methods. The ability of COLDER to predict gene-level dose-response is evaluated in a simulation study and the method is illustrated with data from a National Toxicology Program study of Aflatoxin B1.
{"title":"A hierarchical constrained density regression model for predicting cluster-level dose-response","authors":"Michael L. Pennell, Matthew W. Wheeler, Scott S. Auerbach","doi":"10.1002/env.2880","DOIUrl":"10.1002/env.2880","url":null,"abstract":"<p>With the advent of new alternative methods for rapid toxicity screening of chemicals comes the need for new statistical methodologies which appropriately synthesize the large amount of data collected. For example, transcriptomic assays can be used to assess the impact of a chemical on thousands of genes, but current approaches to analyzing the data treat each gene separately and do not allow sharing of information among genes within pathways. Furthermore, the methods employed are fully parametric and do not account for changes in distribution shape that may occur at high exposure levels. To address the limitations of these methods, we propose Constrained Logistic Density Regression (COLDER) to model expression data from different genes simultaneously. Under COLDER, the dose-response function for each gene is assigned a prior via a discrete logistic stick-breaking process (LSBP) whose weights depend on gene-level characteristics (e.g., pathway membership) and atoms consist of different dose-response functions subject to a shape constraint that ensures biological plausibility. The posterior distribution for the benchmark dose among genes within the same pathways can be estimated directly from the model, which is another advantage over current methods. The ability of COLDER to predict gene-level dose-response is evaluated in a simulation study and the method is illustrated with data from a National Toxicology Program study of Aflatoxin B1.</p>","PeriodicalId":50512,"journal":{"name":"Environmetrics","volume":"35 7","pages":""},"PeriodicalIF":1.5,"publicationDate":"2024-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/env.2880","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142218933","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Duccio Rocchini, Ludovico Chieffallo, Elisa Thouverai, Rossella D'Introno, Francesca Dagostin, Emma Donini, Giles Foody, Simon Garnier, Guilherme G. Mazzochini, Vitezslav Moudry, Bob Rudis, Petra Simova, Michele Torresani, Jakub Nowosad
Colorblindness is a genetic condition that affects a person's ability to accurately perceive colors. Several papers still exist making use of rainbow colors palette to show output. In such cases, for colorblind people such graphs are meaningless. In this paper, we propose good practices and coding solutions developed in the R Free and Open Source Software to (i) simulate colorblindness, (ii) develop colorblind friendly color palettes and (iii) provide the tools for converting a noncolorblind friendly graph into a new image with improved colors.
色盲是一种遗传病,会影响人准确感知颜色的能力。目前仍有一些论文使用彩虹色调色板来显示输出结果。在这种情况下,对于色盲者来说,这些图表毫无意义。在本文中,我们提出了在 R 免费开源软件中开发的良好实践和编码解决方案,以便:(i) 模拟色盲;(ii) 开发色盲友好型调色板;(iii) 提供工具,将非色盲友好型图形转换为具有改进色彩的新图像。
{"title":"Under the mantra: ‘Make use of colorblind friendly graphs’","authors":"Duccio Rocchini, Ludovico Chieffallo, Elisa Thouverai, Rossella D'Introno, Francesca Dagostin, Emma Donini, Giles Foody, Simon Garnier, Guilherme G. Mazzochini, Vitezslav Moudry, Bob Rudis, Petra Simova, Michele Torresani, Jakub Nowosad","doi":"10.1002/env.2877","DOIUrl":"https://doi.org/10.1002/env.2877","url":null,"abstract":"<p>Colorblindness is a genetic condition that affects a person's ability to accurately perceive colors. Several papers still exist making use of rainbow colors palette to show output. In such cases, for colorblind people such graphs are meaningless. In this paper, we propose good practices and coding solutions developed in the R Free and Open Source Software to (i) simulate colorblindness, (ii) develop colorblind friendly color palettes and (iii) provide the tools for converting a noncolorblind friendly graph into a new image with improved colors.</p>","PeriodicalId":50512,"journal":{"name":"Environmetrics","volume":"35 6","pages":""},"PeriodicalIF":1.5,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/env.2877","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142174254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Spatial models for areal data are often constructed such that all pairs of adjacent regions are assumed to have near-identical spatial autocorrelation. In practice, data can exhibit dependence structures more complicated than can be represented under this assumption. In this article, we develop a new model for spatially correlated data observed on graphs, which can flexibly represented many types of spatial dependence patterns while retaining aspects of the original graph geometry. Our method implies an embedding of the graph into Euclidean space wherein covariance can be modeled using traditional covariance functions, such as those from the Matérn family. We parameterize our model using a class of graph metrics compatible with such covariance functions, and which characterize distance in terms of network flow, a property useful for understanding proximity in many ecological settings. By estimating the parameters underlying these metrics, we recover the “intrinsic distances” between graph nodes, which assist in the interpretation of the estimated covariance and allow us to better understand the relationship between the observed process and spatial domain. We compare our model to existing methods for spatially dependent graph data, primarily conditional autoregressive models and their variants, and illustrate advantages of our method over traditional approaches. We fit our model to bird abundance data for several species in North Carolina, and show how it provides insight into the interactions between species-specific spatial distributions and geography.
{"title":"A flexible and interpretable spatial covariance model for data on graphs","authors":"Michael F. Christensen, Peter D. Hoff","doi":"10.1002/env.2879","DOIUrl":"10.1002/env.2879","url":null,"abstract":"<p>Spatial models for areal data are often constructed such that all pairs of adjacent regions are assumed to have near-identical spatial autocorrelation. In practice, data can exhibit dependence structures more complicated than can be represented under this assumption. In this article, we develop a new model for spatially correlated data observed on graphs, which can flexibly represented many types of spatial dependence patterns while retaining aspects of the original graph geometry. Our method implies an embedding of the graph into Euclidean space wherein covariance can be modeled using traditional covariance functions, such as those from the Matérn family. We parameterize our model using a class of graph metrics compatible with such covariance functions, and which characterize distance in terms of network flow, a property useful for understanding proximity in many ecological settings. By estimating the parameters underlying these metrics, we recover the “intrinsic distances” between graph nodes, which assist in the interpretation of the estimated covariance and allow us to better understand the relationship between the observed process and spatial domain. We compare our model to existing methods for spatially dependent graph data, primarily conditional autoregressive models and their variants, and illustrate advantages of our method over traditional approaches. We fit our model to bird abundance data for several species in North Carolina, and show how it provides insight into the interactions between species-specific spatial distributions and geography.</p>","PeriodicalId":50512,"journal":{"name":"Environmetrics","volume":"35 7","pages":""},"PeriodicalIF":1.5,"publicationDate":"2024-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142218937","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We present a novel measure to assess the spatial balance of a sample by utilizing the balancing equation, which captures the balance between the sample units and their neighbours. Spatially balanced samples are desirable as they may reduce the variance of an estimator of a population parameter. If the auxiliary variables we employ to spread the sample possess high explanatory power for the variable(s) of interest, the resulting reduction in variance can be substantial. An advantageous aspect of using auxiliary variables is that their availability is not contingent upon the sampling effort. Therefore, we can assess and compare sampling designs before committing resources to full-scale surveys. By comparing the proposed measure with commonly used measures of spatial balance, we ascertain that our measure consistently yields meaningful insights regarding the spatial balance of samples. Consequently, our measure can effectively differentiate between various designs when planning a survey, evaluate the potential gains from replacing an existing sample, and determine which non-responding units would contribute the most to enhancing the set of responding units.
{"title":"How to find the best sampling design: A new measure of spatial balance","authors":"Wilmer Prentius, Anton Grafström","doi":"10.1002/env.2878","DOIUrl":"10.1002/env.2878","url":null,"abstract":"<p>We present a novel measure to assess the spatial balance of a sample by utilizing the balancing equation, which captures the balance between the sample units and their neighbours. Spatially balanced samples are desirable as they may reduce the variance of an estimator of a population parameter. If the auxiliary variables we employ to spread the sample possess high explanatory power for the variable(s) of interest, the resulting reduction in variance can be substantial. An advantageous aspect of using auxiliary variables is that their availability is not contingent upon the sampling effort. Therefore, we can assess and compare sampling designs before committing resources to full-scale surveys. By comparing the proposed measure with commonly used measures of spatial balance, we ascertain that our measure consistently yields meaningful insights regarding the spatial balance of samples. Consequently, our measure can effectively differentiate between various designs when planning a survey, evaluate the potential gains from replacing an existing sample, and determine which non-responding units would contribute the most to enhancing the set of responding units.</p>","PeriodicalId":50512,"journal":{"name":"Environmetrics","volume":"35 7","pages":""},"PeriodicalIF":1.5,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/env.2878","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142218935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The growing frequency and size of wildfires across the US necessitates accurate quantitative assessment of evolving wildfire behavior to predict risk from future extreme wildfires. We build a joint model of wildfire counts and burned areas, regressing key model parameters on climate and demographic covariates. We use extended generalized Pareto distributions to model the full distribution of burned areas, capturing both moderate and extreme sizes, while leveraging extreme value theory to focus particularly on the right tail. We model wildfire counts with a zero-inflated negative binomial model, and join the wildfire counts and burned areas sub-models using a temporally-varying shared random effect. Our model successfully captures the trends of wildfire counts and burned areas. By investigating the predictive power of different sets of covariates, we find that fire indices are better predictors of wildfire burned area behavior than individual climate covariates, whereas climate covariates are influential drivers of wildfire occurrence behavior.
{"title":"Anthropogenic and meteorological effects on the counts and sizes of moderate and extreme wildfires","authors":"Elizabeth S. Lawler, Benjamin A. Shaby","doi":"10.1002/env.2873","DOIUrl":"10.1002/env.2873","url":null,"abstract":"<p>The growing frequency and size of wildfires across the US necessitates accurate quantitative assessment of evolving wildfire behavior to predict risk from future extreme wildfires. We build a joint model of wildfire counts and burned areas, regressing key model parameters on climate and demographic covariates. We use extended generalized Pareto distributions to model the full distribution of burned areas, capturing both moderate and extreme sizes, while leveraging extreme value theory to focus particularly on the right tail. We model wildfire counts with a zero-inflated negative binomial model, and join the wildfire counts and burned areas sub-models using a temporally-varying shared random effect. Our model successfully captures the trends of wildfire counts and burned areas. By investigating the predictive power of different sets of covariates, we find that fire indices are better predictors of wildfire burned area behavior than individual climate covariates, whereas climate covariates are influential drivers of wildfire occurrence behavior.</p>","PeriodicalId":50512,"journal":{"name":"Environmetrics","volume":"35 7","pages":""},"PeriodicalIF":1.5,"publicationDate":"2024-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/env.2873","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141945284","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jay M. Ver Hoef, Eryn Blagg, Michael Dumelle, Philip M. Dixon, Dale L. Zimmerman, Paul B. Conn
We develop hierarchical models and methods in a fully parametric approach to generalized linear mixed models for any patterned covariance matrix. The Laplace approximation is used to marginally estimate covariance parameters by integrating over all fixed and latent random effects. The Laplace approximation relies on Newton–Raphson updates, which also leads to predictions for the latent random effects. We develop methodology for complete marginal inference, from estimating covariance parameters and fixed effects to making predictions for unobserved data. The marginal likelihood is developed for six distributions that are often used for binary, count, and positive continuous data, and our framework is easily extended to other distributions. We compare our methods to fully Bayesian methods, automatic differentiation, and integrated nested Laplace approximations (INLA) for bias, mean-squared (prediction) error, and interval coverage, and all methods yield very similar results. However, our methods are much faster than Bayesian methods, and more general than INLA. Examples with binary and proportional data, count data, and positive-continuous data are used to illustrate all six distributions with a variety of patterned covariance structures that include spatial models (both geostatistical and areal models), time series models, and mixtures with typical random intercepts based on grouping.
{"title":"Marginal inference for hierarchical generalized linear mixed models with patterned covariance matrices using the Laplace approximation","authors":"Jay M. Ver Hoef, Eryn Blagg, Michael Dumelle, Philip M. Dixon, Dale L. Zimmerman, Paul B. Conn","doi":"10.1002/env.2872","DOIUrl":"10.1002/env.2872","url":null,"abstract":"<p>We develop hierarchical models and methods in a fully parametric approach to generalized linear mixed models for any patterned covariance matrix. The Laplace approximation is used to marginally estimate covariance parameters by integrating over all fixed and latent random effects. The Laplace approximation relies on Newton–Raphson updates, which also leads to predictions for the latent random effects. We develop methodology for complete marginal inference, from estimating covariance parameters and fixed effects to making predictions for unobserved data. The marginal likelihood is developed for six distributions that are often used for binary, count, and positive continuous data, and our framework is easily extended to other distributions. We compare our methods to fully Bayesian methods, automatic differentiation, and integrated nested Laplace approximations (INLA) for bias, mean-squared (prediction) error, and interval coverage, and all methods yield very similar results. However, our methods are much faster than Bayesian methods, and more general than INLA. Examples with binary and proportional data, count data, and positive-continuous data are used to illustrate all six distributions with a variety of patterned covariance structures that include spatial models (both geostatistical and areal models), time series models, and mixtures with typical random intercepts based on grouping.</p>","PeriodicalId":50512,"journal":{"name":"Environmetrics","volume":"35 7","pages":""},"PeriodicalIF":1.5,"publicationDate":"2024-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/env.2872","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141783339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Greta Panunzi, Stefano Moro, Isa Marques, Sara Martino, Francesco Colloca, Francesco Ferretti, Giovanna Jona Lasinio
Conserving oceanic apex predators, such as sharks, is of utmost importance. However, scant abundance and distribution data often challenge understanding the population status of many threatened species. Occurrence records are often scarce and opportunistic, and fieldwork aimed to retrieve additional data is expensive and prone to failure. Integrating various data sources becomes crucial to developing species distribution models for informed sampling and conservation purposes. The white shark, for example, is a rare but persistent inhabitant of the Mediterranean Sea. Here, it is considered Critically Endangered by the IUCN, while population abundance, distribution patterns, and habitat use are still poorly known. This study uses available occurrence records from 1985 to 2021 from diverse sources to construct a spatial log‐Gaussian Cox process, with data‐source specific detection functions and thinning, and accounting for physical barriers. This model estimates white shark presence intensity alongside uncertainty through a Bayesian approach with Integrated Nested Laplace Approximation (INLA) and the inlabru R package. For the first time, we projected species occurrence hot spots and landscapes of relative abundance (continuous measure of animal density in space) throughout the Mediterranean Sea. This approach can be used with other rare species for which presence‐only data from different sources are available.
保护鲨鱼等海洋顶级掠食者至关重要。然而,稀少的丰度和分布数据往往对了解许多濒危物种的种群状况构成挑战。出现记录通常很少,而且是机会性的,而旨在获取更多数据的野外工作成本高昂且容易失败。整合各种数据来源对于建立物种分布模型以实现知情取样和保护目的至关重要。例如,白鲨是地中海稀有但持久的居民。在这里,白鲨被世界自然保护联盟(IUCN)认定为极度濒危物种,但对其种群数量、分布模式和栖息地使用情况仍然知之甚少。本研究利用从 1985 年到 2021 年不同来源的出现记录构建了一个空间对数-高斯 Cox 过程,该过程具有数据源特定的检测功能和稀疏性,并考虑了物理障碍。该模型通过使用集成嵌套拉普拉斯近似法(INLA)和 inlabru R 软件包的贝叶斯方法来估计白鲨的存在强度和不确定性。我们首次预测了整个地中海的物种出现热点和相对丰度景观(空间中动物密度的连续度量)。这种方法可用于其他稀有物种,因为它们可以从不同来源获得仅存在的数据。
{"title":"Estimating the spatial distribution of the white shark in the Mediterranean Sea via an integrated species distribution model accounting for physical barriers","authors":"Greta Panunzi, Stefano Moro, Isa Marques, Sara Martino, Francesco Colloca, Francesco Ferretti, Giovanna Jona Lasinio","doi":"10.1002/env.2876","DOIUrl":"https://doi.org/10.1002/env.2876","url":null,"abstract":"Conserving oceanic apex predators, such as sharks, is of utmost importance. However, scant abundance and distribution data often challenge understanding the population status of many threatened species. Occurrence records are often scarce and opportunistic, and fieldwork aimed to retrieve additional data is expensive and prone to failure. Integrating various data sources becomes crucial to developing species distribution models for informed sampling and conservation purposes. The white shark, for example, is a rare but persistent inhabitant of the Mediterranean Sea. Here, it is considered <jats:italic>Critically Endangered</jats:italic> by the IUCN, while population abundance, distribution patterns, and habitat use are still poorly known. This study uses available occurrence records from 1985 to 2021 from diverse sources to construct a spatial log‐Gaussian Cox process, with data‐source specific detection functions and thinning, and accounting for physical barriers. This model estimates white shark presence intensity alongside uncertainty through a Bayesian approach with Integrated Nested Laplace Approximation (INLA) and the <jats:styled-content>inlabru</jats:styled-content> R package. For the first time, we projected species occurrence hot spots and landscapes of relative abundance (continuous measure of animal density in space) throughout the Mediterranean Sea. This approach can be used with other rare species for which presence‐only data from different sources are available.","PeriodicalId":50512,"journal":{"name":"Environmetrics","volume":"88 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141575753","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Matthew Bonas, Abhirup Datta, Christopher K. Wikle, Edward L. Boone, Faten S. Alamri, Bhava Vyasa Hari, Indulekha Kavila, Susan J. Simmons, Shannon M. Jarvis, Wesley S. Burr, Daniel E. Pagendam, Won Chang, Stefano Castruccio
The ever increasing popularity of machine learning methods in virtually all areas of science, engineering and beyond is poised to put established statistical modeling approaches into question. Environmental statistics is no exception, as popular constructs such as neural networks and decision trees are now routinely used to provide forecasts of physical processes ranging from air pollution to meteorology. This presents both challenges and opportunities to the statistical community, which could contribute to the machine learning literature with a model‐based approach with formal uncertainty quantification. Should, however, classical statistical methodologies be discarded altogether in environmental statistics, and should our contribution be focused on formalizing machine learning constructs? This work aims at providing some answers to this thought‐provoking question with two time series case studies where selected models from both the statistical and machine learning literature are compared in terms of forecasting skills, uncertainty quantification and computational time. Relative merits of both class of approaches are discussed, and broad open questions are formulated as a baseline for a discussion on the topic.
{"title":"Assessing predictability of environmental time series with statistical and machine learning models","authors":"Matthew Bonas, Abhirup Datta, Christopher K. Wikle, Edward L. Boone, Faten S. Alamri, Bhava Vyasa Hari, Indulekha Kavila, Susan J. Simmons, Shannon M. Jarvis, Wesley S. Burr, Daniel E. Pagendam, Won Chang, Stefano Castruccio","doi":"10.1002/env.2864","DOIUrl":"https://doi.org/10.1002/env.2864","url":null,"abstract":"The ever increasing popularity of machine learning methods in virtually all areas of science, engineering and beyond is poised to put established statistical modeling approaches into question. Environmental statistics is no exception, as popular constructs such as neural networks and decision trees are now routinely used to provide forecasts of physical processes ranging from air pollution to meteorology. This presents both challenges and opportunities to the statistical community, which could contribute to the machine learning literature with a model‐based approach with formal uncertainty quantification. Should, however, classical statistical methodologies be discarded altogether in environmental statistics, and should our contribution be focused on formalizing machine learning constructs? This work aims at providing some answers to this thought‐provoking question with two time series case studies where selected models from both the statistical and machine learning literature are compared in terms of forecasting skills, uncertainty quantification and computational time. Relative merits of both class of approaches are discussed, and broad open questions are formulated as a baseline for a discussion on the topic.","PeriodicalId":50512,"journal":{"name":"Environmetrics","volume":"371 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141575754","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Traditional sampling methods may prove inadequate when dealing with spatially clustered populations or when studying rare events or traits that are not easily detectable across the target population. When both scenarios occur simultaneously, adaptive sampling strategies can represent a viable option to enhance the detectability of cases of interest. This paper delves into the application of a novel class of sequential adaptive sampling strategies to animal surveys. These strategies, originally proposed for human population tuberculosis prevalence surveys, allow oversampling of the rare interest variables while managing on‐field constraints. This ensures that the unfixed sample size, typical of adaptive sampling, does not compromise overall cost‐effectiveness. We explore a strategy within this class that integrates an adaptive component into a Poisson sequential selection. The aim is twofold: to intensify the detection of cases by exploiting the spatial clustering and to provide a flexible framework for managing logistics and budget constraints. To illustrate the strengths and weaknesses of this Poisson‐based sequential adaptive sampling strategy compared to traditional sampling methods, a simulation study was conducted on a blue‐winged teal population in Florida, USA. The results showcase the benefits of the proposed strategy and open avenues for future methodological and practical improvements.
{"title":"Applying sequential adaptive strategies for sampling animal populations: An empirical study","authors":"Rosa M. Di Biase, Fulvia Mecatti","doi":"10.1002/env.2870","DOIUrl":"https://doi.org/10.1002/env.2870","url":null,"abstract":"Traditional sampling methods may prove inadequate when dealing with spatially clustered populations or when studying rare events or traits that are not easily detectable across the target population. When both scenarios occur simultaneously, adaptive sampling strategies can represent a viable option to enhance the detectability of cases of interest. This paper delves into the application of a novel class of sequential adaptive sampling strategies to animal surveys. These strategies, originally proposed for human population tuberculosis prevalence surveys, allow oversampling of the rare interest variables while managing on‐field constraints. This ensures that the unfixed sample size, typical of adaptive sampling, does not compromise overall cost‐effectiveness. We explore a strategy within this class that integrates an adaptive component into a Poisson sequential selection. The aim is twofold: to intensify the detection of cases by exploiting the spatial clustering and to provide a flexible framework for managing logistics and budget constraints. To illustrate the strengths and weaknesses of this Poisson‐based sequential adaptive sampling strategy compared to traditional sampling methods, a simulation study was conducted on a blue‐winged teal population in Florida, USA. The results showcase the benefits of the proposed strategy and open avenues for future methodological and practical improvements.","PeriodicalId":50512,"journal":{"name":"Environmetrics","volume":"125 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141515687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}