Discussion of the paper ‘A review of distributed statistical inference’

IF 0.7 Q3 STATISTICS & PROBABILITY Statistical Theory and Related Fields Pub Date : 2021-12-28 DOI:10.1080/24754269.2021.2017544
Heng Lian
{"title":"Discussion of the paper ‘A review of distributed statistical inference’","authors":"Heng Lian","doi":"10.1080/24754269.2021.2017544","DOIUrl":null,"url":null,"abstract":"The authors should be congratulated on their timely contribution to this emerging field with a comprehensive review, which will certainly attract more researchers into this area. In the simplest one-shot approach, the entire dataset is distributed on multiple machines, and each machine computes a local estimate based on local data only, and a central machine performs an aggregation calculation as a final processing step. In more complicated settings, multiple communications are carried out, typically passing also first-order information (gradient) and/or second-order information (Hession matrix) between local machines and the central machine. This review clearly separates the existing works in this area into several sections, considering parameter regression, nonparametric regression, and other models including principal component analysis and variable screening. In this discussion, I will consider some possible future directions that can be entertained in this area, based on my own personal experience. The first problem is a combination of divide-and-conquer estimation with some efficient local algorithm not used in traditional statistical analysis. This is motivated by that, due to the stringent constraint on the number of machines that can be used either practically or in theory (for example, when using a one-shot approach, the number ofmachines that can be used isO( √ N)), the sample size on each worker machine can still be large. In other words, even after partitioning, the local sample sizemay still be too large to be processed by traditional algorithms. In such a case, a more efficient algorithm (one that possibly approximates the exact solution) should be used on each local machine. The important question here is whether the optimal statistical properties can be retained using such an algorithm. One such attempt with an affirmative answer is recently reported in Lian et al. (2021). In this work, we use random sketches (random projection) for kernel regression in anRKHS framework for nonparametric regression. Use of random sketches reduces the computational complexity on each worker machine, and at the same time still retains the optimal statistical convergence rate. We expect combinations along such a direction can be useful in various settings, and for different settings different efficient algorithms to compute some approximate solution are called for. The second problem is to extend the studies beyond the worker-server model. Most of the existing methods in the statistics literature are focused on the centralized system where there is a single special machine that communicates with all others and coordinates computation and communication. However, in many modern applications, such systems are rare and unreliable since the failure of the central machine would be disastrous. Consideration of statistical inference in a decentralized system, synchronous or asynchronous, where there is no such specialized central machine, would be an interesting direction of research for statisticians. Currently, decentralized systems are investigated from a purely optimizational point of view,without incorporating statistical properties (Ram et al., 2010; Yuan et al., 2016). Finally, on the theoretical side, the distributed statistical inference problem provides opportunities and challenges for investigating the fundamental limit (i.e., lower bounds) in performances achievable taking into account communicational, computational and statistical trade-offs. For example, in various models, if a one-short approach is used, then there is a limit in the number of machines allowed in the system and more machines will lead to a suboptimal statistical convergence rate. On the other hand, when multiple communications are allowed, the constraint on the number of machines can be relaxed or even removed. This represents a communicational and statistical trade-off. As another example, the computational and statistical trade-off has already been explored in many works (Khetan & Oh, 2018; L. Wang et al., 2019; T. Wang et al., 2016). The question is how would this change when communications come into play. A general framework taking into account computational, statistical, and communication costs is called for, which would significantly advance the understanding of distributed estimation and inference.","PeriodicalId":22070,"journal":{"name":"Statistical Theory and Related Fields","volume":"6 1","pages":"100 - 101"},"PeriodicalIF":0.7000,"publicationDate":"2021-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistical Theory and Related Fields","FirstCategoryId":"96","ListUrlMain":"https://doi.org/10.1080/24754269.2021.2017544","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
引用次数: 0

Abstract

The authors should be congratulated on their timely contribution to this emerging field with a comprehensive review, which will certainly attract more researchers into this area. In the simplest one-shot approach, the entire dataset is distributed on multiple machines, and each machine computes a local estimate based on local data only, and a central machine performs an aggregation calculation as a final processing step. In more complicated settings, multiple communications are carried out, typically passing also first-order information (gradient) and/or second-order information (Hession matrix) between local machines and the central machine. This review clearly separates the existing works in this area into several sections, considering parameter regression, nonparametric regression, and other models including principal component analysis and variable screening. In this discussion, I will consider some possible future directions that can be entertained in this area, based on my own personal experience. The first problem is a combination of divide-and-conquer estimation with some efficient local algorithm not used in traditional statistical analysis. This is motivated by that, due to the stringent constraint on the number of machines that can be used either practically or in theory (for example, when using a one-shot approach, the number ofmachines that can be used isO( √ N)), the sample size on each worker machine can still be large. In other words, even after partitioning, the local sample sizemay still be too large to be processed by traditional algorithms. In such a case, a more efficient algorithm (one that possibly approximates the exact solution) should be used on each local machine. The important question here is whether the optimal statistical properties can be retained using such an algorithm. One such attempt with an affirmative answer is recently reported in Lian et al. (2021). In this work, we use random sketches (random projection) for kernel regression in anRKHS framework for nonparametric regression. Use of random sketches reduces the computational complexity on each worker machine, and at the same time still retains the optimal statistical convergence rate. We expect combinations along such a direction can be useful in various settings, and for different settings different efficient algorithms to compute some approximate solution are called for. The second problem is to extend the studies beyond the worker-server model. Most of the existing methods in the statistics literature are focused on the centralized system where there is a single special machine that communicates with all others and coordinates computation and communication. However, in many modern applications, such systems are rare and unreliable since the failure of the central machine would be disastrous. Consideration of statistical inference in a decentralized system, synchronous or asynchronous, where there is no such specialized central machine, would be an interesting direction of research for statisticians. Currently, decentralized systems are investigated from a purely optimizational point of view,without incorporating statistical properties (Ram et al., 2010; Yuan et al., 2016). Finally, on the theoretical side, the distributed statistical inference problem provides opportunities and challenges for investigating the fundamental limit (i.e., lower bounds) in performances achievable taking into account communicational, computational and statistical trade-offs. For example, in various models, if a one-short approach is used, then there is a limit in the number of machines allowed in the system and more machines will lead to a suboptimal statistical convergence rate. On the other hand, when multiple communications are allowed, the constraint on the number of machines can be relaxed or even removed. This represents a communicational and statistical trade-off. As another example, the computational and statistical trade-off has already been explored in many works (Khetan & Oh, 2018; L. Wang et al., 2019; T. Wang et al., 2016). The question is how would this change when communications come into play. A general framework taking into account computational, statistical, and communication costs is called for, which would significantly advance the understanding of distributed estimation and inference.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
关于“分布式统计推断综述”一文的讨论
值得祝贺的是,作者们对这一新兴领域做出了及时的贡献,并进行了全面的综述,这必将吸引更多的研究人员进入这一领域。在最简单的一次性方法中,整个数据集分布在多台机器上,每台机器仅基于局部数据计算局部估计,中央机器执行聚合计算作为最终处理步骤。在更复杂的设置中,执行多次通信,通常在本地机器和中央机器之间还传递一阶信息(梯度)和/或二阶信息(Hession矩阵)。这篇综述清楚地将该领域的现有工作分为几个部分,考虑了参数回归、非参数回归和其他模型,包括主成分分析和变量筛选。在这次讨论中,我将根据自己的个人经验,考虑在这一领域未来可能的一些方向。第一个问题是将分治估计与传统统计分析中未使用的一些有效的局部算法相结合。这是因为,由于实际或理论上可以使用的机器数量受到严格限制(例如,当使用一次性方法时,可以使用的机械数量为O(√N)),每个工人机器上的样本量仍然很大。换句话说,即使在分区之后,局部样本大小可能仍然太大,无法通过传统算法进行处理。在这种情况下,应该在每个本地机器上使用更有效的算法(可能接近精确解的算法)。这里的重要问题是,使用这样的算法是否可以保留最佳统计特性。Lian等人最近报道了一个这样的尝试,其答案是肯定的。(2021)。在这项工作中,我们在非参数回归的RKHS框架中使用随机草图(随机投影)进行核回归。随机草图的使用降低了每台工作机器的计算复杂性,同时仍然保持了最佳的统计收敛速度。我们期望沿着这样一个方向的组合在各种设置中都是有用的,并且对于不同的设置,需要不同的高效算法来计算一些近似解。第二个问题是将研究扩展到工作服务器模型之外。统计学文献中大多数现有的方法都集中在集中式系统上,在集中式系统中,只有一台专用机器与所有其他机器进行通信,并协调计算和通信。然而,在许多现代应用中,这种系统是罕见且不可靠的,因为中央机器的故障将是灾难性的。在没有这种专门的中央机器的分散系统中,考虑同步或异步的统计推理,将是统计学家感兴趣的研究方向。目前,分散系统是从纯粹的优化角度进行研究的,没有纳入统计特性(Ram等人,2010;袁等人,2016)。最后,在理论方面,分布式统计推理问题为研究考虑通信、计算和统计权衡的可实现性能的基本极限(即下限)提供了机会和挑战。例如,在各种模型中,如果使用一个简短的方法,那么系统中允许的机器数量是有限的,并且更多的机器将导致次优的统计收敛率。另一方面,当允许多个通信时,可以放宽甚至取消对机器数量的限制。这代表了一种沟通和统计上的权衡。另一个例子是,计算和统计权衡已经在许多工作中进行了探索(Khetan&Oh,2018;L.Wang等人,2019;T.Wang等人,2016)。问题是,当沟通发挥作用时,这种情况会如何改变。需要一个考虑计算、统计和通信成本的通用框架,这将大大促进对分布式估计和推理的理解。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
0.90
自引率
20.00%
发文量
21
期刊最新文献
Multiply robust estimation for average treatment effect among treated Communication-efficient distributed statistical inference on zero-inflated Poisson models FragmGAN: generative adversarial nets for fragmentary data imputation and prediction Log-rank and stratified log-rank tests Autoregressive moving average model for matrix time series
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1