Stata is general‐purpose statistical software. Currently in version 11, Stata is known for its wide range of statistical routines, ease of data management, and publication‐quality graphics. Stata is available on virtually all computing platforms, including Windows, Macintosh, and most varieties of Unix/Linux. It is designed to run on both 32‐bit and 64‐bit architectures and operating systems. Stata possesses both a command‐line interface and a point‐and‐click menu interface, with a one‐to‐one correspondence between the two. Stata appeals to researchers from a wide range of fields, with concentrations in the health sciences and in economics. Statistically, Stata strengths are in the areas of panel/longitudinal data, survival analysis, and the analysis of data from complex surveys. Users can program their own routines using a mixture of Stata's own interpretive language and the compiled matrix‐programming language Mata, included with all Stata installations. Stata is offered in three flavors: Stata/IC, a standard version adequate for most purposes; Stata/SE, an expanded version for use with larger (wider) datasets; and Stata/MP, a version with specialized code designed to make use of multiple cores/processors and run faster on systems that have them. WIREs Comp Stat 2010 2 728–733 DOI: 10.1002/wics.116
Stata是通用的统计软件。目前在版本11中,Stata以其广泛的统计例程,易于数据管理和出版质量图形而闻名。Stata可以在几乎所有的计算平台上使用,包括Windows、Macintosh和大多数种类的Unix/Linux。它被设计在32位和64位体系结构和操作系统上运行。Stata同时具有命令行界面和点按菜单界面,两者之间具有一对一的对应关系。Stata吸引了来自各个领域的研究人员,主要集中在卫生科学和经济学领域。统计上,Stata的优势在于面板/纵向数据、生存分析和复杂调查数据分析。用户可以使用Stata自己的解释语言和编译矩阵编程语言Mata的混合编程自己的例程,包括所有Stata安装。Stata有三种风格:Stata/IC,标准版本,适合大多数用途;Stata/SE,用于更大(更宽)数据集的扩展版本;以及Stata/MP,这是一个带有专门代码的版本,旨在利用多核/处理器,并在拥有多核/处理器的系统上运行得更快。WIREs Comp Stat 2010 2 728-733 DOI: 10.1002/wics.116
{"title":"Stata","authors":"R. Gutierrez","doi":"10.1002/wics.116","DOIUrl":"https://doi.org/10.1002/wics.116","url":null,"abstract":"Stata is general‐purpose statistical software. Currently in version 11, Stata is known for its wide range of statistical routines, ease of data management, and publication‐quality graphics. Stata is available on virtually all computing platforms, including Windows, Macintosh, and most varieties of Unix/Linux. It is designed to run on both 32‐bit and 64‐bit architectures and operating systems. Stata possesses both a command‐line interface and a point‐and‐click menu interface, with a one‐to‐one correspondence between the two. Stata appeals to researchers from a wide range of fields, with concentrations in the health sciences and in economics. Statistically, Stata strengths are in the areas of panel/longitudinal data, survival analysis, and the analysis of data from complex surveys. Users can program their own routines using a mixture of Stata's own interpretive language and the compiled matrix‐programming language Mata, included with all Stata installations. Stata is offered in three flavors: Stata/IC, a standard version adequate for most purposes; Stata/SE, an expanded version for use with larger (wider) datasets; and Stata/MP, a version with specialized code designed to make use of multiple cores/processors and run faster on systems that have them. WIREs Comp Stat 2010 2 728–733 DOI: 10.1002/wics.116","PeriodicalId":47779,"journal":{"name":"Wiley Interdisciplinary Reviews-Computational Statistics","volume":"2 1","pages":""},"PeriodicalIF":3.2,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1002/wics.116","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"51205391","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Asset or security returns are an example of phenomena whose distributions still cannot be convincingly modeled in a parametric framework. James R. (Jim) Thompson (1938–2017) used a variety of nonparametric approaches to develop workable investing solutions in such an environment. We review his ground breaking exploration of the veracity of the capital asset pricing model (CAPM), and several nonparametric approaches to portfolio formulation including the Simugram™, variants of his Max‐Median rule, and Tukey weightings.
{"title":"Sampling James R. Thompson's inspired nonparametric portfolio approaches","authors":"J. Dobelman","doi":"10.1002/wics.1542","DOIUrl":"https://doi.org/10.1002/wics.1542","url":null,"abstract":"Asset or security returns are an example of phenomena whose distributions still cannot be convincingly modeled in a parametric framework. James R. (Jim) Thompson (1938–2017) used a variety of nonparametric approaches to develop workable investing solutions in such an environment. We review his ground breaking exploration of the veracity of the capital asset pricing model (CAPM), and several nonparametric approaches to portfolio formulation including the Simugram™, variants of his Max‐Median rule, and Tukey weightings.","PeriodicalId":47779,"journal":{"name":"Wiley Interdisciplinary Reviews-Computational Statistics","volume":" ","pages":""},"PeriodicalIF":3.2,"publicationDate":"2020-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1002/wics.1542","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41726452","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Item response theory (IRT) is a class of latent variable models, which are used to develop educational and psychological tests (e.g., standardized tests, personality tests, tests for licensure and certification). We offer readers with comprehensive overviews of the theory and applications of IRT through two articles. While Part 1 of the review discusses topics such as foundations of educational measurement, IRT models, item parameter estimation, and applications of IRT with R, this Part 2 reviews areas of test scores based on IRT. The primary focus is on presenting various topics with respect to test equating such as equating designs, IRT‐based equating methods, anchor stability check methods, and impact data analysis that psychometricians would deal with for a large‐scale standardized assessment in practice. These analyses are illustrated in Example section using data from Kolen and Brennan (2014). We also cover the foundation of IRT, IRT‐based person ability parameter estimation methods, and scaling and scale score.
{"title":"Item response theory and its applications in educational measurement Part II: Theory and practices of test equating in item response theory","authors":"Kazuki Hori, Hirotaka Fukuhara, Tsuyoshi Yamada","doi":"10.1002/wics.1543","DOIUrl":"https://doi.org/10.1002/wics.1543","url":null,"abstract":"Item response theory (IRT) is a class of latent variable models, which are used to develop educational and psychological tests (e.g., standardized tests, personality tests, tests for licensure and certification). We offer readers with comprehensive overviews of the theory and applications of IRT through two articles. While Part 1 of the review discusses topics such as foundations of educational measurement, IRT models, item parameter estimation, and applications of IRT with R, this Part 2 reviews areas of test scores based on IRT. The primary focus is on presenting various topics with respect to test equating such as equating designs, IRT‐based equating methods, anchor stability check methods, and impact data analysis that psychometricians would deal with for a large‐scale standardized assessment in practice. These analyses are illustrated in Example section using data from Kolen and Brennan (2014). We also cover the foundation of IRT, IRT‐based person ability parameter estimation methods, and scaling and scale score.","PeriodicalId":47779,"journal":{"name":"Wiley Interdisciplinary Reviews-Computational Statistics","volume":" ","pages":""},"PeriodicalIF":3.2,"publicationDate":"2020-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1002/wics.1543","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47834229","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The prequel to this review provided an extensive treatment of classic zero‐inflated count regression models where a univariate discrete distribution is used for the count regression component of the model. The treatment of zero inflation beyond the classic univariate count regression setting has seen a substantial increase in recent years. This second review paper surveys some of this recent literature and focuses on important developments in handling zero inflation for correlated count settings, discrete time series models, spatial models, and multivariate models. We discuss some of the available computational tools for performing estimation in these settings, while again highlighting the diverse data problems that have been addressed using these methods.
{"title":"Zero‐inflated modeling part II: Zero‐inflated models for complex data structures","authors":"D. S. Young, Eric Roemmele, Xuan Shi","doi":"10.1002/wics.1540","DOIUrl":"https://doi.org/10.1002/wics.1540","url":null,"abstract":"The prequel to this review provided an extensive treatment of classic zero‐inflated count regression models where a univariate discrete distribution is used for the count regression component of the model. The treatment of zero inflation beyond the classic univariate count regression setting has seen a substantial increase in recent years. This second review paper surveys some of this recent literature and focuses on important developments in handling zero inflation for correlated count settings, discrete time series models, spatial models, and multivariate models. We discuss some of the available computational tools for performing estimation in these settings, while again highlighting the diverse data problems that have been addressed using these methods.","PeriodicalId":47779,"journal":{"name":"Wiley Interdisciplinary Reviews-Computational Statistics","volume":" ","pages":""},"PeriodicalIF":3.2,"publicationDate":"2020-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1002/wics.1540","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49314739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Count regression models maintain a steadfast presence in modern applied statistics as highlighted by their usage in diverse areas like biometry, ecology, and insurance. However, a common practical problem with observed count data is the presence of excess zeros relative to the assumed count distribution. The seminal work of Lambert (1992) was one of the first articles to thoroughly treat the problem of zero‐inflated count data in the presence of covariates. Since then, a vast literature has emerged regarding zero‐inflated count regression models. In this first of two review articles, we survey some of the classic and contemporary literature on parametric zero‐inflated count regression models, with emphasis on the utility of different univariate discrete distributions. We highlight some of the primary computational tools available for estimating and assessing the adequacy of these models. We concurrently emphasize the diverse data problems to which these models have been applied.
{"title":"Zero‐inflated modeling part I: Traditional zero‐inflated count regression models, their applications, and computational tools","authors":"D. S. Young, Eric Roemmele, Peng Yeh","doi":"10.1002/wics.1541","DOIUrl":"https://doi.org/10.1002/wics.1541","url":null,"abstract":"Count regression models maintain a steadfast presence in modern applied statistics as highlighted by their usage in diverse areas like biometry, ecology, and insurance. However, a common practical problem with observed count data is the presence of excess zeros relative to the assumed count distribution. The seminal work of Lambert (1992) was one of the first articles to thoroughly treat the problem of zero‐inflated count data in the presence of covariates. Since then, a vast literature has emerged regarding zero‐inflated count regression models. In this first of two review articles, we survey some of the classic and contemporary literature on parametric zero‐inflated count regression models, with emphasis on the utility of different univariate discrete distributions. We highlight some of the primary computational tools available for estimating and assessing the adequacy of these models. We concurrently emphasize the diverse data problems to which these models have been applied.","PeriodicalId":47779,"journal":{"name":"Wiley Interdisciplinary Reviews-Computational Statistics","volume":"14 1","pages":""},"PeriodicalIF":3.2,"publicationDate":"2020-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1002/wics.1541","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41347907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Differential equations have proven to be a powerful mathematical tool in science and engineering, leading to better understanding, prediction, and control of dynamic processes. In this paper, we review the role played by differential equations in data analysis. More specifically, we consider the intersection between differential equations and data analysis in the light of modern statistical learning methodologies.
{"title":"Differential equations in data analysis","authors":"I. Dattner","doi":"10.1002/wics.1534","DOIUrl":"https://doi.org/10.1002/wics.1534","url":null,"abstract":"Differential equations have proven to be a powerful mathematical tool in science and engineering, leading to better understanding, prediction, and control of dynamic processes. In this paper, we review the role played by differential equations in data analysis. More specifically, we consider the intersection between differential equations and data analysis in the light of modern statistical learning methodologies.","PeriodicalId":47779,"journal":{"name":"Wiley Interdisciplinary Reviews-Computational Statistics","volume":" ","pages":""},"PeriodicalIF":3.2,"publicationDate":"2020-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1002/wics.1534","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47930443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The aim of this paper is to provide a selected advanced review on semiparametric regression which is an emergent promising field of researches in functional data analysis. As a deliberate strategy, we decided to focus our discussion on the single functional index regression (SFIR) model in order to fix the ideas about the stakes linked with infinite dimensional problems and about the methodological challenges that one has to solve when building statistical procedure: one of the most challenging issue being the question of dimensionality effects reduction. This will be the first (and the main) part of this discussion and a complete survey of the literature on SFIR model will be presented. In a second attempt, other semiparametric models (and more generally, other dimension reduction models) will be shortly discussed with the double goal of presenting the state of art and of defining challenging tracks for the future. At the end, we will discuss how additive modeling is an appealing idea for more complicated models involving multifunctional predictors and some tracks for the future will be pointed in this setting.
{"title":"On semiparametric regression in functional data analysis","authors":"N. Ling, P. Vieu","doi":"10.1002/wics.1538","DOIUrl":"https://doi.org/10.1002/wics.1538","url":null,"abstract":"The aim of this paper is to provide a selected advanced review on semiparametric regression which is an emergent promising field of researches in functional data analysis. As a deliberate strategy, we decided to focus our discussion on the single functional index regression (SFIR) model in order to fix the ideas about the stakes linked with infinite dimensional problems and about the methodological challenges that one has to solve when building statistical procedure: one of the most challenging issue being the question of dimensionality effects reduction. This will be the first (and the main) part of this discussion and a complete survey of the literature on SFIR model will be presented. In a second attempt, other semiparametric models (and more generally, other dimension reduction models) will be shortly discussed with the double goal of presenting the state of art and of defining challenging tracks for the future. At the end, we will discuss how additive modeling is an appealing idea for more complicated models involving multifunctional predictors and some tracks for the future will be pointed in this setting.","PeriodicalId":47779,"journal":{"name":"Wiley Interdisciplinary Reviews-Computational Statistics","volume":" ","pages":""},"PeriodicalIF":3.2,"publicationDate":"2020-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1002/wics.1538","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48091620","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Uncertainty quantification (UQ) includes the characterization, integration, and propagation of uncertainties that result from stochastic variations and a lack of knowledge or data in the natural world. Monte Carlo (MC) method is a sampling‐based approach that has widely used for quantification and propagation of uncertainties. However, the standard MC method is often time‐consuming if the simulation‐based model is computationally intensive. This article gives an overview of modern MC methods to address the existing challenges of the standard MC in the context of UQ. Specifically, multilevel Monte Carlo (MLMC) extending the concept of control variates achieves a significant reduction of the computational cost by performing most evaluations with low accuracy and corresponding low cost, and relatively few evaluations at high accuracy and corresponding high cost. Multifidelity Monte Carlo (MFMC) accelerates the convergence of standard Monte Carlo by generalizing the control variates with different models having varying fidelities and varying computational costs. Multimodel Monte Carlo method (MMMC), having a different setting of MLMC and MFMC, aims to address the issue of UQ and propagation when data for characterizing probability distributions are limited. Multimodel inference combined with importance sampling is proposed for quantifying and efficiently propagating the uncertainties resulting from small data sets. All of these three modern MC methods achieve a significant improvement of computational efficiency for probabilistic UQ, particularly uncertainty propagation. An algorithm summary and the corresponding code implementation are provided for each of the modern MC methods. The extension and application of these methods are discussed in detail.
{"title":"Modern Monte Carlo methods for efficient uncertainty quantification and propagation: A survey","authors":"Jiaxin Zhang","doi":"10.1002/wics.1539","DOIUrl":"https://doi.org/10.1002/wics.1539","url":null,"abstract":"Uncertainty quantification (UQ) includes the characterization, integration, and propagation of uncertainties that result from stochastic variations and a lack of knowledge or data in the natural world. Monte Carlo (MC) method is a sampling‐based approach that has widely used for quantification and propagation of uncertainties. However, the standard MC method is often time‐consuming if the simulation‐based model is computationally intensive. This article gives an overview of modern MC methods to address the existing challenges of the standard MC in the context of UQ. Specifically, multilevel Monte Carlo (MLMC) extending the concept of control variates achieves a significant reduction of the computational cost by performing most evaluations with low accuracy and corresponding low cost, and relatively few evaluations at high accuracy and corresponding high cost. Multifidelity Monte Carlo (MFMC) accelerates the convergence of standard Monte Carlo by generalizing the control variates with different models having varying fidelities and varying computational costs. Multimodel Monte Carlo method (MMMC), having a different setting of MLMC and MFMC, aims to address the issue of UQ and propagation when data for characterizing probability distributions are limited. Multimodel inference combined with importance sampling is proposed for quantifying and efficiently propagating the uncertainties resulting from small data sets. All of these three modern MC methods achieve a significant improvement of computational efficiency for probabilistic UQ, particularly uncertainty propagation. An algorithm summary and the corresponding code implementation are provided for each of the modern MC methods. The extension and application of these methods are discussed in detail.","PeriodicalId":47779,"journal":{"name":"Wiley Interdisciplinary Reviews-Computational Statistics","volume":" ","pages":""},"PeriodicalIF":3.2,"publicationDate":"2020-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1002/wics.1539","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48021151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}