关于“分布式统计推断综述”的讨论

IF 0.7 Q3 STATISTICS & PROBABILITY Statistical Theory and Related Fields Pub Date : 2022-02-04 DOI:10.1080/24754269.2022.2030107
Yang Yu, Guang Cheng
{"title":"关于“分布式统计推断综述”的讨论","authors":"Yang Yu, Guang Cheng","doi":"10.1080/24754269.2022.2030107","DOIUrl":null,"url":null,"abstract":"We congratulate the authors on an impressive team effort to comprehensively review various statistical estimation and inference methods in distributed frameworks. This paper is an excellent resource for anyone wishing to understand why distributed inference is important in the era of big data, what the challenges of conducting distributed inference instead of centralized inference are, and how statisticians propose solutions to overcome these challenges. First, we notice that this paper focuses mainly on distributed estimation, and we would like to point out several other works on distributed inference. For smooth loss functions, Jordan et al. (2018) established asymptotic normality for their multi-round distributed estimator, which yields two communication-efficient approaches to constructing confidence regions using a sandwiched covariance matrix. For non-smooth loss functions, Chen et al. (2021) similarly proposed a sandwich-type confidence interval based on the asymptotic normality of their distributed estimator. More generic inference approaches, such as bootstrap, have also been studied in the massive data setting including the distributed framework. The authors reviewed the Bag of Little Bootstraps (BLB) method proposed by Kleiner et al. (2014), which is to repeatedly resample and refit the model at each local machine and finally aggregate the bootstrap statistics. Considering the huge computational cost of BLB, Sengupta et al. (2016) proposed the Subsampled Double Bootstrap (SDB) method, which has higher computational efficiency but requires a large number of local machines to maintain statistical accuracy. In addition to distributed samples, the dimensionality can also become large in the big data era, and in this case researchers may be more interested in simultaneous inference onmultiple parameters. In the centralized setting, bootstrap is one of the solutions to the simultaneous inference problems (Zhang & Cheng, 2017). In a distributed framework where the dimensionality grows, Yu et al. (2020) proposed distributed bootstrap methods for simultaneous inference, which not only are efficient in terms of both communication and","PeriodicalId":22070,"journal":{"name":"Statistical Theory and Related Fields","volume":"6 1","pages":"102 - 103"},"PeriodicalIF":0.7000,"publicationDate":"2022-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Discussion on ‘A review of distributed statistical inference’\",\"authors\":\"Yang Yu, Guang Cheng\",\"doi\":\"10.1080/24754269.2022.2030107\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We congratulate the authors on an impressive team effort to comprehensively review various statistical estimation and inference methods in distributed frameworks. This paper is an excellent resource for anyone wishing to understand why distributed inference is important in the era of big data, what the challenges of conducting distributed inference instead of centralized inference are, and how statisticians propose solutions to overcome these challenges. First, we notice that this paper focuses mainly on distributed estimation, and we would like to point out several other works on distributed inference. For smooth loss functions, Jordan et al. (2018) established asymptotic normality for their multi-round distributed estimator, which yields two communication-efficient approaches to constructing confidence regions using a sandwiched covariance matrix. For non-smooth loss functions, Chen et al. (2021) similarly proposed a sandwich-type confidence interval based on the asymptotic normality of their distributed estimator. More generic inference approaches, such as bootstrap, have also been studied in the massive data setting including the distributed framework. The authors reviewed the Bag of Little Bootstraps (BLB) method proposed by Kleiner et al. (2014), which is to repeatedly resample and refit the model at each local machine and finally aggregate the bootstrap statistics. Considering the huge computational cost of BLB, Sengupta et al. (2016) proposed the Subsampled Double Bootstrap (SDB) method, which has higher computational efficiency but requires a large number of local machines to maintain statistical accuracy. In addition to distributed samples, the dimensionality can also become large in the big data era, and in this case researchers may be more interested in simultaneous inference onmultiple parameters. In the centralized setting, bootstrap is one of the solutions to the simultaneous inference problems (Zhang & Cheng, 2017). In a distributed framework where the dimensionality grows, Yu et al. (2020) proposed distributed bootstrap methods for simultaneous inference, which not only are efficient in terms of both communication and\",\"PeriodicalId\":22070,\"journal\":{\"name\":\"Statistical Theory and Related Fields\",\"volume\":\"6 1\",\"pages\":\"102 - 103\"},\"PeriodicalIF\":0.7000,\"publicationDate\":\"2022-02-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Statistical Theory and Related Fields\",\"FirstCategoryId\":\"96\",\"ListUrlMain\":\"https://doi.org/10.1080/24754269.2022.2030107\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"STATISTICS & PROBABILITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistical Theory and Related Fields","FirstCategoryId":"96","ListUrlMain":"https://doi.org/10.1080/24754269.2022.2030107","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
引用次数: 0

摘要

我们祝贺作者们令人印象深刻的团队努力,全面回顾了分布式框架中的各种统计估计和推理方法。对于任何想要理解分布式推理在大数据时代为何如此重要、进行分布式推理而不是集中式推理的挑战是什么、以及统计学家如何提出克服这些挑战的解决方案的人来说,这篇论文都是一个很好的资源。首先,我们注意到本文主要关注分布式估计,并且我们想指出在分布式推理方面的其他一些工作。对于平滑损失函数,Jordan等人(2018)为他们的多轮分布估计器建立了渐近正态性,这产生了两种使用夹心协方差矩阵构建置信区域的通信高效方法。对于非光滑损失函数,Chen等人(2021)同样提出了基于其分布估计量的渐近正态性的三明治型置信区间。更通用的推理方法,如bootstrap,也在包括分布式框架在内的海量数据环境中得到了研究。本文回顾了Kleiner et al.(2014)提出的Bag of Little bootstrap (BLB)方法,即在每台本地机器上反复重新采样和重构模型,最后汇总bootstrap统计数据。考虑到BLB的巨大计算成本,Sengupta等(2016)提出了subsampling Double Bootstrap (SDB)方法,该方法具有更高的计算效率,但需要大量的局部机来保持统计精度。除了分布式样本,在大数据时代,维数也会变得很大,在这种情况下,研究人员可能会对多参数的同时推理更感兴趣。在集中式设置中,bootstrap是同时推理问题的解决方案之一(Zhang & Cheng, 2017)。在维数增长的分布式框架中,Yu等人(2020)提出了用于同时推理的分布式自举方法,该方法不仅在通信和数据处理方面都是高效的
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Discussion on ‘A review of distributed statistical inference’
We congratulate the authors on an impressive team effort to comprehensively review various statistical estimation and inference methods in distributed frameworks. This paper is an excellent resource for anyone wishing to understand why distributed inference is important in the era of big data, what the challenges of conducting distributed inference instead of centralized inference are, and how statisticians propose solutions to overcome these challenges. First, we notice that this paper focuses mainly on distributed estimation, and we would like to point out several other works on distributed inference. For smooth loss functions, Jordan et al. (2018) established asymptotic normality for their multi-round distributed estimator, which yields two communication-efficient approaches to constructing confidence regions using a sandwiched covariance matrix. For non-smooth loss functions, Chen et al. (2021) similarly proposed a sandwich-type confidence interval based on the asymptotic normality of their distributed estimator. More generic inference approaches, such as bootstrap, have also been studied in the massive data setting including the distributed framework. The authors reviewed the Bag of Little Bootstraps (BLB) method proposed by Kleiner et al. (2014), which is to repeatedly resample and refit the model at each local machine and finally aggregate the bootstrap statistics. Considering the huge computational cost of BLB, Sengupta et al. (2016) proposed the Subsampled Double Bootstrap (SDB) method, which has higher computational efficiency but requires a large number of local machines to maintain statistical accuracy. In addition to distributed samples, the dimensionality can also become large in the big data era, and in this case researchers may be more interested in simultaneous inference onmultiple parameters. In the centralized setting, bootstrap is one of the solutions to the simultaneous inference problems (Zhang & Cheng, 2017). In a distributed framework where the dimensionality grows, Yu et al. (2020) proposed distributed bootstrap methods for simultaneous inference, which not only are efficient in terms of both communication and
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
0.90
自引率
20.00%
发文量
21
期刊最新文献
Multiply robust estimation for average treatment effect among treated Communication-efficient distributed statistical inference on zero-inflated Poisson models FragmGAN: generative adversarial nets for fragmentary data imputation and prediction Log-rank and stratified log-rank tests Autoregressive moving average model for matrix time series
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1