Christopher K. Wikle, Abhirup Datta, Bhava Vyasa Hari, Edward L. Boone, Indranil Sahoo, Indulekha Kavila, Stefano Castruccio, Susan J. Simmons, Wesley S. Burr, Won Chang
Historically, two primary criticisms statisticians have of machine learning and deep neural models is their lack of uncertainty quantification and the inability to do inference (i.e., to explain what inputs are important). Explainable AI has developed in the last few years as a sub-discipline of computer science and machine learning to mitigate these concerns (as well as concerns of fairness and transparency in deep modeling). In this article, our focus is on explaining which inputs are important in models for predicting environmental data. In particular, we focus on three general methods for explainability that are model agnostic and thus applicable across a breadth of models without internal explainability: “feature shuffling”, “interpretable local surrogates”, and “occlusion analysis”. We describe particular implementations of each of these and illustrate their use with a variety of models, all applied to the problem of long-lead forecasting monthly soil moisture in the North American corn belt given sea surface temperature anomalies in the Pacific Ocean.
{"title":"An illustration of model agnostic explainability methods applied to environmental data","authors":"Christopher K. Wikle, Abhirup Datta, Bhava Vyasa Hari, Edward L. Boone, Indranil Sahoo, Indulekha Kavila, Stefano Castruccio, Susan J. Simmons, Wesley S. Burr, Won Chang","doi":"10.1002/env.2772","DOIUrl":"10.1002/env.2772","url":null,"abstract":"<p>Historically, two primary criticisms statisticians have of machine learning and deep neural models is their lack of uncertainty quantification and the inability to do inference (i.e., to explain what inputs are important). Explainable AI has developed in the last few years as a sub-discipline of computer science and machine learning to mitigate these concerns (as well as concerns of fairness and transparency in deep modeling). In this article, our focus is on explaining which inputs are important in models for predicting environmental data. In particular, we focus on three general methods for explainability that are model agnostic and thus applicable across a breadth of models without internal explainability: “feature shuffling”, “interpretable local surrogates”, and “occlusion analysis”. We describe particular implementations of each of these and illustrate their use with a variety of models, all applied to the problem of long-lead forecasting monthly soil moisture in the North American corn belt given sea surface temperature anomalies in the Pacific Ocean.</p>","PeriodicalId":50512,"journal":{"name":"Environmetrics","volume":"34 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2022-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/env.2772","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9495377","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The study of wildfire spread and the growth of the area burned is an important task in ecological studies and in other contexts. In this work we present a model for fire spread and show the results obtained from simulations of burned areas. The model is based on probabilities of fire at different locations. Such probabilities are obtained from the intensity function of a spatial point process model fitted to the observed pattern of fires in the Valencian Community for the years 1993–2015. The models, applied to different wildfires in Spain, including the different temporal states, combines the features of a network model with those of a quasi-physical model of the interaction between burning and nonburning cells, which strongly depends on covariates. The results of the simulated wildfire burned areas resemble the burned areas observed in real cases, suggesting that the model proposed, based on a Markov process called Random Spread Process, works adequately. The model can be extended to simulate other random spread processes such as epidemics.
{"title":"Modeling the spatial evolution wildfires using random spread process","authors":"Carlos Díaz-Avalos, Pablo Juan","doi":"10.1002/env.2774","DOIUrl":"10.1002/env.2774","url":null,"abstract":"<p>The study of wildfire spread and the growth of the area burned is an important task in ecological studies and in other contexts. In this work we present a model for fire spread and show the results obtained from simulations of burned areas. The model is based on probabilities of fire at different locations. Such probabilities are obtained from the intensity function of a spatial point process model fitted to the observed pattern of fires in the Valencian Community for the years 1993–2015. The models, applied to different wildfires in Spain, including the different temporal states, combines the features of a network model with those of a quasi-physical model of the interaction between burning and nonburning cells, which strongly depends on covariates. The results of the simulated wildfire burned areas resemble the burned areas observed in real cases, suggesting that the model proposed, based on a Markov process called Random Spread Process, works adequately. The model can be extended to simulate other random spread processes such as epidemics.</p>","PeriodicalId":50512,"journal":{"name":"Environmetrics","volume":"33 8","pages":""},"PeriodicalIF":1.7,"publicationDate":"2022-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/env.2774","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84482208","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The detection of abrupt changes in an evolving pattern of time series in the presence of missing data still poses a challenge to real applications. We formulate the multiple changepoint problem into a latent Markov model on a countably infinite state space. For efficiency-enhancing, we propose a partially collapsed Gibbs sampler for the inference of the joint posterior of the number of changepoints and their locations. Variants of Viterbi algorithms are suggested for obtaining the MAP estimates of random changepoints in the presence of missing data, which provides better performances in these varying-dimensional problems. The method is generally applicable for multiple changepoint detection under a variety of missing data mechanism. The method is applied to a case study of the magnitude-frequency distribution of the 2010 Darfield M7.1 earthquake sequence in New Zealand. We find out some unusual features of the seismic b-value in the Darfield earthquake sequence. It is noted that two changepoints are detected and in contrast to the background seismic b-value, relatively low b-values in the early aftershock propagation period are identified. We suggest that this might be a forewarning of potentially devastatingly strong aftershocks. The advance in the method of b-value changepoint detection will enhance our understanding of earthquake occurrence and potentially lead to improved risk forecasting.
{"title":"Bayesian multiple changepoint detection with missing data and its application to the magnitude-frequency distributions","authors":"Shaochuan Lu","doi":"10.1002/env.2775","DOIUrl":"https://doi.org/10.1002/env.2775","url":null,"abstract":"<p>The detection of abrupt changes in an evolving pattern of time series in the presence of missing data still poses a challenge to real applications. We formulate the multiple changepoint problem into a latent Markov model on a countably infinite state space. For efficiency-enhancing, we propose a partially collapsed Gibbs sampler for the inference of the joint posterior of the number of changepoints and their locations. Variants of Viterbi algorithms are suggested for obtaining the MAP estimates of random changepoints in the presence of missing data, which provides better performances in these varying-dimensional problems. The method is generally applicable for multiple changepoint detection under a variety of missing data mechanism. The method is applied to a case study of the magnitude-frequency distribution of the 2010 Darfield M7.1 earthquake sequence in New Zealand. We find out some unusual features of the seismic <i>b</i>-value in the Darfield earthquake sequence. It is noted that two changepoints are detected and in contrast to the background seismic <i>b</i>-value, relatively low <i>b</i>-values in the early aftershock propagation period are identified. We suggest that this might be a forewarning of potentially devastatingly strong aftershocks. The advance in the method of <i>b</i>-value changepoint detection will enhance our understanding of earthquake occurrence and potentially lead to improved risk forecasting.</p>","PeriodicalId":50512,"journal":{"name":"Environmetrics","volume":"34 4","pages":""},"PeriodicalIF":1.7,"publicationDate":"2022-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50141295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Samantha M. Roth, Ben Seiyon Lee, Sanjib Sharma, Iman Hosseini-Shakib, Klaus Keller, Murali Haran
Riverine floods pose a considerable risk to many communities. Improving flood hazard projections has the potential to inform the design and implementation of flood risk management strategies. Current flood hazard projections are uncertain, especially due to uncertain model parameters. Calibration methods use observations to quantify model parameter uncertainty. With limited computational resources, researchers typically calibrate models using either relatively few expensive model runs at high spatial resolutions or many cheaper runs at lower spatial resolutions. This leads to an open question: is it possible to effectively combine information from the high and low resolution model runs? We propose a Bayesian emulation–calibration approach that assimilates model outputs and observations at multiple resolutions. As a case study for a riverine community in Pennsylvania, we demonstrate our approach using the LISFLOOD-FP flood hazard model. The multiresolution approach results in improved parameter inference over the single resolution approach in multiple scenarios. Results vary based on the parameter values and the number of available models runs. Our method is general and can be used to calibrate other high dimensional computer models to improve projections.
{"title":"Flood hazard model calibration using multiresolution model output","authors":"Samantha M. Roth, Ben Seiyon Lee, Sanjib Sharma, Iman Hosseini-Shakib, Klaus Keller, Murali Haran","doi":"10.1002/env.2769","DOIUrl":"https://doi.org/10.1002/env.2769","url":null,"abstract":"<p>Riverine floods pose a considerable risk to many communities. Improving flood hazard projections has the potential to inform the design and implementation of flood risk management strategies. Current flood hazard projections are uncertain, especially due to uncertain model parameters. Calibration methods use observations to quantify model parameter uncertainty. With limited computational resources, researchers typically calibrate models using either relatively few expensive model runs at high spatial resolutions or many cheaper runs at lower spatial resolutions. This leads to an open question: is it possible to effectively combine information from the high and low resolution model runs? We propose a Bayesian emulation–calibration approach that assimilates model outputs and observations at multiple resolutions. As a case study for a riverine community in Pennsylvania, we demonstrate our approach using the LISFLOOD-FP flood hazard model. The multiresolution approach results in improved parameter inference over the single resolution approach in multiple scenarios. Results vary based on the parameter values and the number of available models runs. Our method is general and can be used to calibrate other high dimensional computer models to improve projections.</p>","PeriodicalId":50512,"journal":{"name":"Environmetrics","volume":"34 2","pages":""},"PeriodicalIF":1.7,"publicationDate":"2022-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50136284","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In many applications of spatial regression modeling, the spatially indexed covariates are measured with error, and it is known that ignoring this measurement error can lead to attenuation of the estimated regression coefficients. Classical measurement error techniques may not be appropriate in the spatial setting, due to the lack of validation data and the presence of (residual) spatial correlation among the responses. In this article, we propose a double fixed rank kriging (FRK) approach to obtain bias-corrected estimates of and inference on coefficients in spatial regression models, where the covariates are spatially indexed and subject to measurement error. Assuming they vary smoothly in space, the proposed method first fits an FRK model regressing the covariates against spatial basis functions to obtain predictions of the error-free covariates. These are then passed into a second FRK model, where the response is regressed against the predicted covariates plus another set of spatial basis functions to account for spatial correlation. A simulation study and an application to presence–absence records of Carolina wren from the North American Breeding Bird Survey demonstrate that the proposed double FRK approach can be effective in adjusting for measurement error in spatially correlated data.
{"title":"A double fixed rank kriging approach to spatial regression models with covariate measurement error","authors":"Xu Ning, Francis K. C. Hui, Alan H. Welsh","doi":"10.1002/env.2771","DOIUrl":"https://doi.org/10.1002/env.2771","url":null,"abstract":"<p>In many applications of spatial regression modeling, the spatially indexed covariates are measured with error, and it is known that ignoring this measurement error can lead to attenuation of the estimated regression coefficients. Classical measurement error techniques may not be appropriate in the spatial setting, due to the lack of validation data and the presence of (residual) spatial correlation among the responses. In this article, we propose a double fixed rank kriging (FRK) approach to obtain bias-corrected estimates of and inference on coefficients in spatial regression models, where the covariates are spatially indexed and subject to measurement error. Assuming they vary smoothly in space, the proposed method first fits an FRK model regressing the covariates against spatial basis functions to obtain predictions of the error-free covariates. These are then passed into a second FRK model, where the response is regressed against the predicted covariates plus another set of spatial basis functions to account for spatial correlation. A simulation study and an application to presence–absence records of Carolina wren from the North American Breeding Bird Survey demonstrate that the proposed double FRK approach can be effective in adjusting for measurement error in spatially correlated data.</p>","PeriodicalId":50512,"journal":{"name":"Environmetrics","volume":"34 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2022-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/env.2771","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50145274","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Decision-makers abhor uncertainty, and it is certainly true that the less there is of it the better. However, recognizing that uncertainty is part of the equation, particularly for deciding on environmental policy, is a prerequisite for making wise decisions. Even making no decision is a decision that has consequences, and using the presence of uncertainty as the reason for failing to act is a poor excuse. Statistical science is the science of uncertainty, and it should play a critical role in the decision-making process. This opinion piece focuses on the summit of the knowledge pyramid that starts from data and rises in steps from data to information, from information to knowledge, and finally from knowledge to decisions. Enormous advances have been made in the last 100 years ascending the pyramid, with deviations that have followed different routes. There has generally been a healthy supply of uncertainty quantification along the way but, in a rush to the top, where the decisions are made, uncertainty is often left behind. In my opinion, statistical science needs to be much more pro-active in evolving classical decision theory into a relevant and practical area of decision applications. This article follows several threads, building on the decision-theoretic foundations of loss functions and Bayesian uncertainty.
{"title":"Decisions, decisions, decisions in an uncertain environment","authors":"Noel Cressie","doi":"10.1002/env.2767","DOIUrl":"https://doi.org/10.1002/env.2767","url":null,"abstract":"<p>Decision-makers abhor uncertainty, and it is certainly true that the less there is of it the better. However, recognizing that uncertainty is part of the equation, particularly for deciding on environmental policy, is a prerequisite for making wise decisions. Even making no decision is a decision that has consequences, and using the presence of uncertainty as the reason for failing to act is a poor excuse. Statistical science is the science of uncertainty, and it should play a critical role in the decision-making process. This opinion piece focuses on the summit of the knowledge pyramid that starts from data and rises in steps from data to information, from information to knowledge, and finally from knowledge to decisions. Enormous advances have been made in the last 100 years ascending the pyramid, with deviations that have followed different routes. There has generally been a healthy supply of uncertainty quantification along the way but, in a rush to the top, where the decisions are made, uncertainty is often left behind. In my opinion, statistical science needs to be much more pro-active in evolving classical decision theory into a relevant and practical area of decision applications. This article follows several threads, building on the decision-theoretic foundations of loss functions and Bayesian uncertainty.</p>","PeriodicalId":50512,"journal":{"name":"Environmetrics","volume":"34 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2022-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/env.2767","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50145275","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
William Kleiber, Stephan Sain, Luke Madaus, Patrick Harr
Tropical cyclones are important drivers of coastal flooding which have severe negative public safety and economic consequences. Due to the rare occurrence of such events, high spatial and temporal resolution historical storm precipitation data are limited in availability. This article introduces a statistical tropical cyclone space-time precipitation generator given limited information from storm track datasets. Given a handful of predictor variables that are common in either historical or simulated storm track ensembles such as pressure deficit at the storm's center, radius of maximal winds, storm center and direction, and distance to coast, the proposed stochastic model generates space-time fields of quantitative precipitation over the study domain. Statistically novel aspects include that the model is developed in Lagrangian coordinates with respect to the dynamic storm center that uses ideas from low-rank representations along with circular process models. The model is trained on a set of tropical cyclone data from an advanced weather forecasting model over the Gulf of Mexico and southern United States, and is validated by cross-validation. Results show the model appropriately captures spatial asymmetry of cyclone precipitation patterns, total precipitation as well as the local distribution of precipitation at a set of case study locations along the coast. We additionally compare our model against a widely-used statistical forecast, and illustrate that our approach better captures uncertainty, as well as storm characteristics such as asymmetry.
{"title":"Stochastic tropical cyclone precipitation field generation","authors":"William Kleiber, Stephan Sain, Luke Madaus, Patrick Harr","doi":"10.1002/env.2766","DOIUrl":"https://doi.org/10.1002/env.2766","url":null,"abstract":"<p>Tropical cyclones are important drivers of coastal flooding which have severe negative public safety and economic consequences. Due to the rare occurrence of such events, high spatial and temporal resolution historical storm precipitation data are limited in availability. This article introduces a statistical tropical cyclone space-time precipitation generator given limited information from storm track datasets. Given a handful of predictor variables that are common in either historical or simulated storm track ensembles such as pressure deficit at the storm's center, radius of maximal winds, storm center and direction, and distance to coast, the proposed stochastic model generates space-time fields of quantitative precipitation over the study domain. Statistically novel aspects include that the model is developed in Lagrangian coordinates with respect to the dynamic storm center that uses ideas from low-rank representations along with circular process models. The model is trained on a set of tropical cyclone data from an advanced weather forecasting model over the Gulf of Mexico and southern United States, and is validated by cross-validation. Results show the model appropriately captures spatial asymmetry of cyclone precipitation patterns, total precipitation as well as the local distribution of precipitation at a set of case study locations along the coast. We additionally compare our model against a widely-used statistical forecast, and illustrate that our approach better captures uncertainty, as well as storm characteristics such as asymmetry.</p>","PeriodicalId":50512,"journal":{"name":"Environmetrics","volume":"34 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2022-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50123086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Giovanna Jona Lasinio, Fabio Divino, Gianfranco Lovison, Marco Mingione, Pierfrancesco Alaimo Di Loro, Alessio Farcomeni, Antonello Maruotti
The amount and poor quality of available data and the need of appropriate modeling of the main epidemic indicators require specific skills. In this context, the statistician plays a key role in the process that leads to policy decisions, starting with monitoring changes and evaluating risks. The “what” and the “why” of these changes represent fundamental research questions to provide timely and effective tools to manage the evolution of the epidemic. Answers to such questions need appropriate statistical models and visualization tools. Here, we give an overview of the role played by Statgroup-19, an independent Italian research group born in March 2020. The group includes seven statisticians from different Italian universities, each with different backgrounds but with a shared interest in data analysis, statistical modeling, and biostatistics. Since the beginning of the COVID-19 pandemic the group has interacted with authorities and journalists to support policy decisions and inform the general public about the evolution of the epidemic. This collaboration led to several scientific papers and an accrued visibility across various media, all made possible by the continuous interaction across the group members that shared their unique expertise.
{"title":"Two years of COVID-19 pandemic: The Italian experience of Statgroup-19","authors":"Giovanna Jona Lasinio, Fabio Divino, Gianfranco Lovison, Marco Mingione, Pierfrancesco Alaimo Di Loro, Alessio Farcomeni, Antonello Maruotti","doi":"10.1002/env.2768","DOIUrl":"10.1002/env.2768","url":null,"abstract":"<p>The amount and poor quality of available data and the need of appropriate modeling of the main epidemic indicators require specific skills. In this context, the statistician plays a key role in the process that leads to policy decisions, starting with monitoring changes and evaluating risks. The “what” and the “why” of these changes represent fundamental research questions to provide timely and effective tools to manage the evolution of the epidemic. Answers to such questions need appropriate statistical models and visualization tools. Here, we give an overview of the role played by Statgroup-19, an independent Italian research group born in March 2020. The group includes seven statisticians from different Italian universities, each with different backgrounds but with a shared interest in data analysis, statistical modeling, and biostatistics. Since the beginning of the COVID-19 pandemic the group has interacted with authorities and journalists to support policy decisions and inform the general public about the evolution of the epidemic. This collaboration led to several scientific papers and an accrued visibility across various media, all made possible by the continuous interaction across the group members that shared their unique expertise.</p>","PeriodicalId":50512,"journal":{"name":"Environmetrics","volume":"33 8","pages":""},"PeriodicalIF":1.7,"publicationDate":"2022-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9136278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}