Estimation of an unstructured covariance matrix is difficult because of the challenges posed by parameter space dimensionality and the positive‐definiteness constraint that estimates should satisfy. We consider a general nonparametric covariance estimation framework for longitudinal data using the Cholesky decomposition of a positive‐definite matrix. The covariance matrix of time‐ordered measurements is diagonalized by a lower triangular matrix with unconstrained entries that are statistically interpretable as parameters for a varying coefficient autoregressive model. Using this dual interpretation of the Cholesky decomposition and allowing for irregular sampling time points, we treat covariance estimation as bivariate smoothing and cast it in a regularization framework for desired forms of simplicity in covariance models. Viewing stationarity as a form of simplicity or parsimony in covariance, we model the varying coefficient function with components depending on time lag and its orthogonal direction separately and penalize the components that capture the nonstationarity in the fitted function. We demonstrate construction of a covariance estimator using the smoothing spline framework. Simulation studies establish the advantage of our approach over alternative estimators proposed in the longitudinal data setting. We analyze a longitudinal dataset to illustrate application of the methodology and compare our estimates to those resulting from alternative models.
{"title":"Nonparametric covariance estimation with shrinkage toward stationary models","authors":"T. A. Blake, Yoonkyung Lee","doi":"10.1002/wics.1507","DOIUrl":"https://doi.org/10.1002/wics.1507","url":null,"abstract":"Estimation of an unstructured covariance matrix is difficult because of the challenges posed by parameter space dimensionality and the positive‐definiteness constraint that estimates should satisfy. We consider a general nonparametric covariance estimation framework for longitudinal data using the Cholesky decomposition of a positive‐definite matrix. The covariance matrix of time‐ordered measurements is diagonalized by a lower triangular matrix with unconstrained entries that are statistically interpretable as parameters for a varying coefficient autoregressive model. Using this dual interpretation of the Cholesky decomposition and allowing for irregular sampling time points, we treat covariance estimation as bivariate smoothing and cast it in a regularization framework for desired forms of simplicity in covariance models. Viewing stationarity as a form of simplicity or parsimony in covariance, we model the varying coefficient function with components depending on time lag and its orthogonal direction separately and penalize the components that capture the nonstationarity in the fitted function. We demonstrate construction of a covariance estimator using the smoothing spline framework. Simulation studies establish the advantage of our approach over alternative estimators proposed in the longitudinal data setting. We analyze a longitudinal dataset to illustrate application of the methodology and compare our estimates to those resulting from alternative models.","PeriodicalId":47779,"journal":{"name":"Wiley Interdisciplinary Reviews-Computational Statistics","volume":" ","pages":""},"PeriodicalIF":3.2,"publicationDate":"2020-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1002/wics.1507","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47761789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this article, we have presented a review of existing methods and trends in survival analysis and frailty models. The background has been presented for each topic discussed for survival and frailty models where the presentation flows from original methods to more advanced methods. This article has also shown various current methodologies that exist among survival and frailty models. The advantages and disadvantages of more recent methodologies are presented and discussed in this review.
{"title":"Review of current advances in survival analysis and frailty models","authors":"Usha Govindarajulu, R. D'Agostino","doi":"10.1002/wics.1504","DOIUrl":"https://doi.org/10.1002/wics.1504","url":null,"abstract":"In this article, we have presented a review of existing methods and trends in survival analysis and frailty models. The background has been presented for each topic discussed for survival and frailty models where the presentation flows from original methods to more advanced methods. This article has also shown various current methodologies that exist among survival and frailty models. The advantages and disadvantages of more recent methodologies are presented and discussed in this review.","PeriodicalId":47779,"journal":{"name":"Wiley Interdisciplinary Reviews-Computational Statistics","volume":"12 1","pages":""},"PeriodicalIF":3.2,"publicationDate":"2020-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1002/wics.1504","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41737607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We review the literature on spatial and spatiotemporal models based on spatial multiscale factorizations. Specifically, we review models based on wavelets and Kolaczyk–Huang factorizations for Gaussian and Poisson data. These multiscale models decompose spatial and spatiotemporal datasets into many small components, called multiscale coefficients, at multiple levels of spatial resolution. Then analysis proceeds independently for each multiscale coefficient. After that, aggregation equations are used to coherently combine the analyses from the multiple multiscale coefficients to obtain a statistical analysis at the original resolution level. The computational cost of such analysis grows linearly with sample size. Furthermore, computations for these models are scalable, parallelizable, and fast. Therefore, these multiscale models are tremendously useful for the analysis of massive spatial and spatiotemporal datasets.
{"title":"Bayesian spatial and spatiotemporal models based on multiscale factorizations","authors":"Marco A. R. Ferreira","doi":"10.1002/wics.1509","DOIUrl":"https://doi.org/10.1002/wics.1509","url":null,"abstract":"We review the literature on spatial and spatiotemporal models based on spatial multiscale factorizations. Specifically, we review models based on wavelets and Kolaczyk–Huang factorizations for Gaussian and Poisson data. These multiscale models decompose spatial and spatiotemporal datasets into many small components, called multiscale coefficients, at multiple levels of spatial resolution. Then analysis proceeds independently for each multiscale coefficient. After that, aggregation equations are used to coherently combine the analyses from the multiple multiscale coefficients to obtain a statistical analysis at the original resolution level. The computational cost of such analysis grows linearly with sample size. Furthermore, computations for these models are scalable, parallelizable, and fast. Therefore, these multiscale models are tremendously useful for the analysis of massive spatial and spatiotemporal datasets.","PeriodicalId":47779,"journal":{"name":"Wiley Interdisciplinary Reviews-Computational Statistics","volume":" ","pages":""},"PeriodicalIF":3.2,"publicationDate":"2020-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1002/wics.1509","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47673378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Statistical models for animal movement provide tools that help ecologists and biologists learn how animals interact with their environment and each other. Efforts to develop increasingly realistic, implementable, and scientifically valuable methods for analyzing remotely observed trajectories have provided practitioners with a wide selection of models to help them understand animal behavior. Increasingly, researchers are interested in studying multiple animals jointly, which requires methods that can account for dependence across individuals. Dependence can arise for many reasons, including shared behavioral tendencies, familial relationships, and direct interactions on the landscape. We provide a synopsis of recent statistical methods for animal movement data applicable to settings in which inference is desired across multiple individuals. Highlights of these approaches include the ability to infer shared behavioral traits across a group of individuals and the ability to infer unobserved social networks summarizing dynamic relationships that manifest themselves in movement decisions.
{"title":"Animal movement models for multiple individuals","authors":"H. Scharf, F. Buderman","doi":"10.1002/wics.1506","DOIUrl":"https://doi.org/10.1002/wics.1506","url":null,"abstract":"Statistical models for animal movement provide tools that help ecologists and biologists learn how animals interact with their environment and each other. Efforts to develop increasingly realistic, implementable, and scientifically valuable methods for analyzing remotely observed trajectories have provided practitioners with a wide selection of models to help them understand animal behavior. Increasingly, researchers are interested in studying multiple animals jointly, which requires methods that can account for dependence across individuals. Dependence can arise for many reasons, including shared behavioral tendencies, familial relationships, and direct interactions on the landscape. We provide a synopsis of recent statistical methods for animal movement data applicable to settings in which inference is desired across multiple individuals. Highlights of these approaches include the ability to infer shared behavioral traits across a group of individuals and the ability to infer unobserved social networks summarizing dynamic relationships that manifest themselves in movement decisions.","PeriodicalId":47779,"journal":{"name":"Wiley Interdisciplinary Reviews-Computational Statistics","volume":" ","pages":""},"PeriodicalIF":3.2,"publicationDate":"2020-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1002/wics.1506","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45395796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Forecasting, especially high‐dimensional forecasting, is becoming more and more sought after, particularly as computing resources increase in both size and speed. Flow field forecasting is a general purpose regression‐based forecasting method that has recently been expanded to high‐dimensional settings. In this article, we provide an overview of the flow field forecasting methodology, with a particular emphasis on environments where the number of candidate predictor variables is large, potentially larger than the number of observations.
{"title":"A review of flow field forecasting: A high‐dimensional forecasting procedure","authors":"Kyle A. Caudle, Patrick S. Fleming, R. Hoover","doi":"10.1002/wics.1505","DOIUrl":"https://doi.org/10.1002/wics.1505","url":null,"abstract":"Forecasting, especially high‐dimensional forecasting, is becoming more and more sought after, particularly as computing resources increase in both size and speed. Flow field forecasting is a general purpose regression‐based forecasting method that has recently been expanded to high‐dimensional settings. In this article, we provide an overview of the flow field forecasting methodology, with a particular emphasis on environments where the number of candidate predictor variables is large, potentially larger than the number of observations.","PeriodicalId":47779,"journal":{"name":"Wiley Interdisciplinary Reviews-Computational Statistics","volume":" ","pages":""},"PeriodicalIF":3.2,"publicationDate":"2020-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1002/wics.1505","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42758852","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
During the last 20–30 years, there was a remarkable growth in interest on approaches for stationary count time series. We consider popular classes of models for such time series, including thinning‐based models, conditional regression models, and Hidden‐Markov models. We review and compare important members of these model families, having regard to stochastic properties such as the dispersion and autocorrelation structure. Our survey covers univariate and multivariate count data, as well as unbounded and bounded counts. We also discuss an illustrative data example. Besides this critical presentation of the current state‐of‐the‐art, some existing challenges and opportunities for future research are identified.
{"title":"Stationary count time series models","authors":"C. Weiß","doi":"10.1002/wics.1502","DOIUrl":"https://doi.org/10.1002/wics.1502","url":null,"abstract":"During the last 20–30 years, there was a remarkable growth in interest on approaches for stationary count time series. We consider popular classes of models for such time series, including thinning‐based models, conditional regression models, and Hidden‐Markov models. We review and compare important members of these model families, having regard to stochastic properties such as the dispersion and autocorrelation structure. Our survey covers univariate and multivariate count data, as well as unbounded and bounded counts. We also discuss an illustrative data example. Besides this critical presentation of the current state‐of‐the‐art, some existing challenges and opportunities for future research are identified.","PeriodicalId":47779,"journal":{"name":"Wiley Interdisciplinary Reviews-Computational Statistics","volume":" ","pages":""},"PeriodicalIF":3.2,"publicationDate":"2020-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1002/wics.1502","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44954649","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ideally, statistical parametric model fitting is followed by various summary tables which show predictor contributions, visualizations which assess model assumptions and goodness of fit, and test statistics which compare models. In contrast, modern machine‐learning fits are usually black box in nature, offer high‐performing predictions but suffer from an interpretability deficit. We examine how the paradigm of conditional visualization can be used to address this, specifically to explain predictor contributions, assess goodness of fit, and compare multiple, competing fits. We compare visualizations from techniques including trellis, condvis, visreg, lime, partial dependence, and ice plots. Our examples use random forest fits, but all techniques presented are model agnostic.
{"title":"Model exploration using conditional visualization","authors":"C. Hurley","doi":"10.1002/wics.1503","DOIUrl":"https://doi.org/10.1002/wics.1503","url":null,"abstract":"Ideally, statistical parametric model fitting is followed by various summary tables which show predictor contributions, visualizations which assess model assumptions and goodness of fit, and test statistics which compare models. In contrast, modern machine‐learning fits are usually black box in nature, offer high‐performing predictions but suffer from an interpretability deficit. We examine how the paradigm of conditional visualization can be used to address this, specifically to explain predictor contributions, assess goodness of fit, and compare multiple, competing fits. We compare visualizations from techniques including trellis, condvis, visreg, lime, partial dependence, and ice plots. Our examples use random forest fits, but all techniques presented are model agnostic.","PeriodicalId":47779,"journal":{"name":"Wiley Interdisciplinary Reviews-Computational Statistics","volume":" ","pages":""},"PeriodicalIF":3.2,"publicationDate":"2020-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1002/wics.1503","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41820022","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}