各种通用方差:一个面向对象的聚类协方差在R中的实现

IF 8.1 2区计算机科学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Journal of Statistical Software Pub Date : 2020-10-07 DOI:10.18637/JSS.V095.I01

A. Zeileis, Susanne Köll, N. Graham

{"title":"各种通用方差:一个面向对象的聚类协方差在R中的实现","authors":"A. Zeileis, Susanne Köll, N. Graham","doi":"10.18637/JSS.V095.I01","DOIUrl":null,"url":null,"abstract":"Clustered covariances or clustered standard errors are very widely used to account for correlated or clustered data, especially in economics, political sciences, or other social sciences. They are employed to adjust the inference following estimation of a standard least-squares regression or generalized linear model estimated by maximum likelihood. Although many publications just refer to \"the\" clustered standard errors, there is a surprisingly wide variety of clustered covariances particularly due to different flavors of bias corrections. Furthermore, while the linear regression model is certainly the most important application case, the same strategies can be employed in more general models (e.g. for zero-inflated, censored, or limited responses). In R, functions for covariances in clustered or panel models have been somewhat scattered or available only for certain modeling functions, notably the (generalized) linear regression model. In contrast, an object-oriented approach to \"robust\" covariance matrix estimation - applicable beyond lm() and glm() - is available in the sandwich package but has been limited to the case of cross-section or time series data. Now, this shortcoming has been corrected in sandwich (starting from version 2.4.0): Based on methods for two generic functions (estfun() and bread()), clustered and panel covariances are now provided in vcovCL(), vcovPL(), and vcovPC(). These are directly applicable to models from many packages, e.g., including MASS, pscl, countreg, betareg, among others. Some empirical illustrations are provided as well as an assessment of the methods' performance in a simulation study.","PeriodicalId":17237,"journal":{"name":"Journal of Statistical Software","volume":"41 1","pages":""},"PeriodicalIF":8.1000,"publicationDate":"2020-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"290","resultStr":"{\"title\":\"Various Versatile Variances: An Object-Oriented Implementation of Clustered Covariances in R\",\"authors\":\"A. Zeileis, Susanne Köll, N. Graham\",\"doi\":\"10.18637/JSS.V095.I01\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Clustered covariances or clustered standard errors are very widely used to account for correlated or clustered data, especially in economics, political sciences, or other social sciences. They are employed to adjust the inference following estimation of a standard least-squares regression or generalized linear model estimated by maximum likelihood. Although many publications just refer to \\\"the\\\" clustered standard errors, there is a surprisingly wide variety of clustered covariances particularly due to different flavors of bias corrections. Furthermore, while the linear regression model is certainly the most important application case, the same strategies can be employed in more general models (e.g. for zero-inflated, censored, or limited responses). In R, functions for covariances in clustered or panel models have been somewhat scattered or available only for certain modeling functions, notably the (generalized) linear regression model. In contrast, an object-oriented approach to \\\"robust\\\" covariance matrix estimation - applicable beyond lm() and glm() - is available in the sandwich package but has been limited to the case of cross-section or time series data. Now, this shortcoming has been corrected in sandwich (starting from version 2.4.0): Based on methods for two generic functions (estfun() and bread()), clustered and panel covariances are now provided in vcovCL(), vcovPL(), and vcovPC(). These are directly applicable to models from many packages, e.g., including MASS, pscl, countreg, betareg, among others. Some empirical illustrations are provided as well as an assessment of the methods' performance in a simulation study.\",\"PeriodicalId\":17237,\"journal\":{\"name\":\"Journal of Statistical Software\",\"volume\":\"41 1\",\"pages\":\"\"},\"PeriodicalIF\":8.1000,\"publicationDate\":\"2020-10-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"290\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Statistical Software\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.18637/JSS.V095.I01\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Statistical Software","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.18637/JSS.V095.I01","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 290

摘要

聚类协方差或聚类标准误差被广泛用于解释相关或聚类数据，特别是在经济学、政治学或其他社会科学中。它们用于调整标准最小二乘回归或由极大似然估计的广义线性模型估计后的推断。尽管许多出版物只是指“聚类标准误差”，但聚类协方差的种类之多令人惊讶，特别是由于不同类型的偏差校正。此外，虽然线性回归模型无疑是最重要的应用案例，但同样的策略可以用于更一般的模型(例如，对于零膨胀，审查或有限的响应)。在R中，聚类或面板模型中的协方差函数有些分散，或者仅适用于某些建模函数，特别是(广义)线性回归模型。相比之下，三明治包中有一种“健壮”协方差矩阵估计的面向对象方法(适用于lm()和glm()之外)，但仅限于横截面或时间序列数据的情况。现在，这个缺点已经在sandwich中得到了纠正(从2.4.0版本开始):基于两个泛型函数(estfun()和bread())的方法，vcovCL()、vcovPL()和vcovPC()现在提供了聚类和面板协方差。这些直接适用于许多包中的模型，例如，包括MASS、pscl、country、betareg等。在模拟研究中，提供了一些实证说明以及对方法性能的评估。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Various Versatile Variances: An Object-Oriented Implementation of Clustered Covariances in R

Clustered covariances or clustered standard errors are very widely used to account for correlated or clustered data, especially in economics, political sciences, or other social sciences. They are employed to adjust the inference following estimation of a standard least-squares regression or generalized linear model estimated by maximum likelihood. Although many publications just refer to "the" clustered standard errors, there is a surprisingly wide variety of clustered covariances particularly due to different flavors of bias corrections. Furthermore, while the linear regression model is certainly the most important application case, the same strategies can be employed in more general models (e.g. for zero-inflated, censored, or limited responses). In R, functions for covariances in clustered or panel models have been somewhat scattered or available only for certain modeling functions, notably the (generalized) linear regression model. In contrast, an object-oriented approach to "robust" covariance matrix estimation - applicable beyond lm() and glm() - is available in the sandwich package but has been limited to the case of cross-section or time series data. Now, this shortcoming has been corrected in sandwich (starting from version 2.4.0): Based on methods for two generic functions (estfun() and bread()), clustered and panel covariances are now provided in vcovCL(), vcovPL(), and vcovPC(). These are directly applicable to models from many packages, e.g., including MASS, pscl, countreg, betareg, among others. Some empirical illustrations are provided as well as an assessment of the methods' performance in a simulation study.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Statistical Software 工程技术-计算机：跨学科应用

CiteScore

10.70

自引率

1.70%

发文量

审稿时长

6-12 weeks

期刊介绍： The Journal of Statistical Software (JSS) publishes open-source software and corresponding reproducible articles discussing all aspects of the design, implementation, documentation, application, evaluation, comparison, maintainance and distribution of software dedicated to improvement of state-of-the-art in statistical computing in all areas of empirical research. Open-source code and articles are jointly reviewed and published in this journal and should be accessible to a broad community of practitioners, teachers, and researchers in the field of statistics.

期刊最新文献

BoXHED2.0: Scalable Boosting of Dynamic Survival Analysis. Optimum Allocation for Adaptive Multi-Wave Sampling in R: The R Package optimall. spsurvey: Spatial Sampling Design and Analysis in R. Application of Equal Local Levels to Improve Q-Q Plot Testing Bands with R Package qqconf. Elastic Net Regularization Paths for All Generalized Linear Models.