The functional single index models are widely used to describe the nonlinear relationship between a scalar response and a functional predictor. The conventional functional single index model assumes that the coefficient function is nonzero in the entire time domain. In other words, the functional predictor always has a nonzero effect on the response all the time. We propose a new compact functional single index model, in which the coefficient function is only nonzero in a subregion. We also propose an efficient method that can simultaneously estimate the nonlinear link function, the coefficient function and also the nonzero region of the coefficient function. Hence, our method can identify the region in which the functional predictor is related to the response. Our method is illustrated by an application example in which the total number of daily bike rentals is predicted based on hourly temperature data. The finite sample performance of the proposed method is investigated by comparing it to the conventional functional single index model in a simulation study
{"title":"Estimating functional single index models with compact support","authors":"Yunlong Nie, Liangliang Wang, Jiguo Cao","doi":"10.1002/env.2784","DOIUrl":"https://doi.org/10.1002/env.2784","url":null,"abstract":"<p>The functional single index models are widely used to describe the nonlinear relationship between a scalar response and a functional predictor. The conventional functional single index model assumes that the coefficient function is nonzero in the entire time domain. In other words, the functional predictor always has a nonzero effect on the response all the time. We propose a new compact functional single index model, in which the coefficient function is only nonzero in a subregion. We also propose an efficient method that can simultaneously estimate the nonlinear link function, the coefficient function and also the nonzero region of the coefficient function. Hence, our method can identify the region in which the functional predictor is related to the response. Our method is illustrated by an application example in which the total number of daily bike rentals is predicted based on hourly temperature data. The finite sample performance of the proposed method is investigated by comparing it to the conventional functional single index model in a simulation study</p>","PeriodicalId":50512,"journal":{"name":"Environmetrics","volume":"34 2","pages":""},"PeriodicalIF":1.7,"publicationDate":"2022-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50139793","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alex Diana, Eleni Matechou, Jim E. Griffin, Yadvendradev Jhala, Qamar Qureshi
Capture-recapture (CR) data and corresponding models have been used extensively to estimate the size of wildlife populations when detection probability is less than 1. When the locations of traps or cameras used to capture or detect individuals are known, spatially-explicit CR models are used to infer the spatial pattern of the individual locations and population density. Individual locations, referred to as activity centers (ACs), are defined as the locations around which the individuals move. These ACs are typically assumed to be independent, and their spatial pattern is modeled using homogeneous Poisson processes. However, this assumption is often unrealistic, since individuals can interact with each other, either within a species or between different species. In this article, we consider a vector of point processes from the general class of interaction point processes and develop a model for CR data that can account for interactions, in particular repulsions, between and within multiple species. Interaction point processes present a challenge from an inferential perspective because of the intractability of the normalizing constant of the likelihood function, and hence standard Markov chain Monte Carlo procedures to perform Bayesian inference cannot be applied. Therefore, we adopt an inference procedure based on the Monte Carlo Metropolis Hastings algorithm, which scales well when modeling more than one species. Finally, we adopt an inference method for jointly sampling the latent ACs and the population size based on birth and death processes. This approach also allows us to adaptively tune the proposal distribution of new points, which leads to better mixing especially in the case of non-uniformly distributed traps. We apply the model to a CR data-set on leopards and tigers collected at the Corbett Tiger Reserve in India. Our findings suggest that between species repulsion is stronger than within species, while tiger population density is higher than leopard population density at the park.
{"title":"A vector of point processes for modeling interactions between and within species using capture-recapture data","authors":"Alex Diana, Eleni Matechou, Jim E. Griffin, Yadvendradev Jhala, Qamar Qureshi","doi":"10.1002/env.2781","DOIUrl":"10.1002/env.2781","url":null,"abstract":"<p>Capture-recapture (CR) data and corresponding models have been used extensively to estimate the size of wildlife populations when detection probability is less than 1. When the locations of traps or cameras used to capture or detect individuals are known, spatially-explicit CR models are used to infer the spatial pattern of the individual locations and population density. Individual locations, referred to as activity centers (ACs), are defined as the locations around which the individuals move. These ACs are typically assumed to be independent, and their spatial pattern is modeled using homogeneous Poisson processes. However, this assumption is often unrealistic, since individuals can interact with each other, either within a species or between different species. In this article, we consider a vector of point processes from the general class of interaction point processes and develop a model for CR data that can account for interactions, in particular repulsions, between and within multiple species. Interaction point processes present a challenge from an inferential perspective because of the intractability of the normalizing constant of the likelihood function, and hence standard Markov chain Monte Carlo procedures to perform Bayesian inference cannot be applied. Therefore, we adopt an inference procedure based on the Monte Carlo Metropolis Hastings algorithm, which scales well when modeling more than one species. Finally, we adopt an inference method for jointly sampling the latent ACs and the population size based on birth and death processes. This approach also allows us to adaptively tune the proposal distribution of new points, which leads to better mixing especially in the case of non-uniformly distributed traps. We apply the model to a CR data-set on leopards and tigers collected at the Corbett Tiger Reserve in India. Our findings suggest that between species repulsion is stronger than within species, while tiger population density is higher than leopard population density at the park.</p>","PeriodicalId":50512,"journal":{"name":"Environmetrics","volume":"33 8","pages":""},"PeriodicalIF":1.7,"publicationDate":"2022-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/env.2781","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91338693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In recent years, immense amounts of data have been generated, from sensors to purchase transaction records, mobile GPS signals, digital satellite images, and social media. The raise of data collection has brought the need for quantitative minded professionals able to transform that data into information and decision making. In this opinion piece, we will share some of our views and experiences about the general role that data science plays nowadays, with a special interest in the field of environmetrics. We will include a limited number of examples that highlight the usefulness of data science in environmetrics, and a specific illustration of the behavior of the wildfires in Brazil between January and December of 2021.
{"title":"Data science applied to environmental sciences","authors":"Paulo Canas Rodrigues, Elisabetta Carfagna","doi":"10.1002/env.2783","DOIUrl":"https://doi.org/10.1002/env.2783","url":null,"abstract":"<p>In recent years, immense amounts of data have been generated, from sensors to purchase transaction records, mobile GPS signals, digital satellite images, and social media. The raise of data collection has brought the need for quantitative minded professionals able to transform that data into information and decision making. In this opinion piece, we will share some of our views and experiences about the general role that data science plays nowadays, with a special interest in the field of environmetrics. We will include a limited number of examples that highlight the usefulness of data science in environmetrics, and a specific illustration of the behavior of the wildfires in Brazil between January and December of 2021.</p>","PeriodicalId":50512,"journal":{"name":"Environmetrics","volume":"34 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2022-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50120317","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
There has been a great deal of recent interest in the development of spatial prediction algorithms for very large datasets and/or prediction domains. These methods have primarily been developed in the spatial statistics community, but there has been growing interest in the machine learning community for such methods, primarily driven by the success of deep Gaussian process regression approaches and deep convolutional neural networks. These methods are often computationally expensive to train and implement and consequently, there has been a resurgence of interest in random projections and deep learning models based on random weights—so called reservoir computing methods. Here, we combine several of these ideas to develop the random ensemble deep spatial (REDS) approach to predict spatial data. The procedure uses random Fourier features as inputs to an extreme learning machine (a deep neural model with random weights), and with calibrated ensembles of outputs from this model based on different random weights, it provides a simple uncertainty quantification. The REDS method is demonstrated on simulated data and on a classic large satellite data set.
{"title":"REDS: Random ensemble deep spatial prediction","authors":"Ranadeep Daw, Christopher K. Wikle","doi":"10.1002/env.2780","DOIUrl":"https://doi.org/10.1002/env.2780","url":null,"abstract":"<p>There has been a great deal of recent interest in the development of spatial prediction algorithms for very large datasets and/or prediction domains. These methods have primarily been developed in the spatial statistics community, but there has been growing interest in the machine learning community for such methods, primarily driven by the success of deep Gaussian process regression approaches and deep convolutional neural networks. These methods are often computationally expensive to train and implement and consequently, there has been a resurgence of interest in random projections and deep learning models based on random weights—so called reservoir computing methods. Here, we combine several of these ideas to develop the random ensemble deep spatial (REDS) approach to predict spatial data. The procedure uses random Fourier features as inputs to an extreme learning machine (a deep neural model with random weights), and with calibrated ensembles of outputs from this model based on different random weights, it provides a simple uncertainty quantification. The REDS method is demonstrated on simulated data and on a classic large satellite data set.</p>","PeriodicalId":50512,"journal":{"name":"Environmetrics","volume":"34 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2022-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50120564","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Heavy rainfall distributional modeling is essential in any impact studies linked to the water cycle, for example, flood risks. Still, statistical analyses that both take into account the temporal and multivariate nature of extreme rainfall are rare, and often, a complex de-clustering step is needed to make extreme rainfall temporally independent. A natural question is how to bypass this de-clustering in a multivariate context. To address this issue, we introduce the stable sums method. Our goal is to incorporate time and space extreme dependencies in the analysis of heavy tails. To reach our goal, we build on large deviations of regularly varying stationary time series. Numerical experiments demonstrate that our novel approach enhances return levels inference in two ways. First, it is robust concerning time dependencies. We implement it alike on independent and dependent observations. In the univariate setting, it improves the accuracy of confidence intervals compared to the main estimators requiring temporal de-clustering. Second, it thoughtfully integrates the spatial dependencies. In simulation, the multivariate stable sums method has a smaller mean squared error than its component-wise implementation. We apply our method to infer high return levels of daily fall precipitation amounts from a national network of weather stations in France.
{"title":"Stable sums to infer high return levels of multivariate rainfall time series","authors":"Gloria Buriticá, Philippe Naveau","doi":"10.1002/env.2782","DOIUrl":"https://doi.org/10.1002/env.2782","url":null,"abstract":"<p>Heavy rainfall distributional modeling is essential in any impact studies linked to the water cycle, for example, flood risks. Still, statistical analyses that both take into account the temporal and multivariate nature of extreme rainfall are rare, and often, a complex de-clustering step is needed to make extreme rainfall temporally independent. A natural question is how to bypass this de-clustering in a multivariate context. To address this issue, we introduce the stable sums method. Our goal is to incorporate time and space extreme dependencies in the analysis of heavy tails. To reach our goal, we build on large deviations of regularly varying stationary time series. Numerical experiments demonstrate that our novel approach enhances return levels inference in two ways. First, it is robust concerning time dependencies. We implement it alike on independent and dependent observations. In the univariate setting, it improves the accuracy of confidence intervals compared to the main estimators requiring temporal de-clustering. Second, it thoughtfully integrates the spatial dependencies. In simulation, the multivariate stable sums method has a smaller mean squared error than its component-wise implementation. We apply our method to infer high return levels of daily fall precipitation amounts from a national network of weather stations in France.</p>","PeriodicalId":50512,"journal":{"name":"Environmetrics","volume":"34 4","pages":""},"PeriodicalIF":1.7,"publicationDate":"2022-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/env.2782","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50155644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ujjal Kumar Mukherjee, Benjamin E. Bagozzi, Snigdhansu Chatterjee
Climate change stands to have a profound impact on human society, and on political and other conflicts in particular. However, the existing literature on understanding the relation between climate change and societal conflicts has often been criticized for using data that suffer from sampling and other biases, often resulting from being too narrowly focused on a small region of space or a small set of events. These studies have likewise been critiqued for not using suitable statistical tools that (