Distributed Matrix Computations With Low-Weight Encodings

Anindya Bijoy Das;Aditya Ramamoorthy;David J. Love;Christopher G. Brinton
{"title":"Distributed Matrix Computations With Low-Weight Encodings","authors":"Anindya Bijoy Das;Aditya Ramamoorthy;David J. Love;Christopher G. Brinton","doi":"10.1109/JSAIT.2023.3308768","DOIUrl":null,"url":null,"abstract":"Straggler nodes are well-known bottlenecks of distributed matrix computations which induce reductions in computation/communication speeds. A common strategy for mitigating such stragglers is to incorporate Reed-Solomon based MDS (maximum distance separable) codes into the framework; this can achieve resilience against an optimal number of stragglers. However, these codes assign dense linear combinations of submatrices to the worker nodes. When the input matrices are sparse, these approaches increase the number of non-zero entries in the encoded matrices, which in turn adversely affects the worker computation time. In this work, we develop a distributed matrix computation approach where the assigned encoded submatrices are random linear combinations of a small number of submatrices. In addition to being well suited for sparse input matrices, our approach continues to have the optimal straggler resilience in a certain range of problem parameters. Moreover, compared to recent sparse matrix computation approaches, the search for a “good” set of random coefficients to promote numerical stability in our method is much more computationally efficient. We show that our approach can efficiently utilize partial computations done by slower worker nodes in a heterogeneous system which can enhance the overall computation speed. Numerical experiments conducted through Amazon Web Services (AWS) demonstrate up to 30% reduction in per worker node computation time and \n<inline-formula> <tex-math>$100\\times $ </tex-math></inline-formula>\n faster encoding compared to the available methods.","PeriodicalId":73295,"journal":{"name":"IEEE journal on selected areas in information theory","volume":"4 ","pages":"363-378"},"PeriodicalIF":0.0000,"publicationDate":"2023-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE journal on selected areas in information theory","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10234626/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

Straggler nodes are well-known bottlenecks of distributed matrix computations which induce reductions in computation/communication speeds. A common strategy for mitigating such stragglers is to incorporate Reed-Solomon based MDS (maximum distance separable) codes into the framework; this can achieve resilience against an optimal number of stragglers. However, these codes assign dense linear combinations of submatrices to the worker nodes. When the input matrices are sparse, these approaches increase the number of non-zero entries in the encoded matrices, which in turn adversely affects the worker computation time. In this work, we develop a distributed matrix computation approach where the assigned encoded submatrices are random linear combinations of a small number of submatrices. In addition to being well suited for sparse input matrices, our approach continues to have the optimal straggler resilience in a certain range of problem parameters. Moreover, compared to recent sparse matrix computation approaches, the search for a “good” set of random coefficients to promote numerical stability in our method is much more computationally efficient. We show that our approach can efficiently utilize partial computations done by slower worker nodes in a heterogeneous system which can enhance the overall computation speed. Numerical experiments conducted through Amazon Web Services (AWS) demonstrate up to 30% reduction in per worker node computation time and $100\times $ faster encoding compared to the available methods.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
具有低权重编码的分布式矩阵计算
杂散节点是分布式矩阵计算的众所周知的瓶颈,其导致计算/通信速度的降低。减轻这种掉队者的一种常见策略是将基于Reed-Solomon的MDS(最大距离可分离)码合并到框架中;这可以实现对抗最优数量的掉队者的弹性。然而,这些代码将子矩阵的密集线性组合分配给工作节点。当输入矩阵是稀疏的时,这些方法会增加编码矩阵中非零项的数量,这反过来又会对工作者的计算时间产生不利影响。在这项工作中,我们开发了一种分布式矩阵计算方法,其中指定的编码子矩阵是少量子矩阵的随机线性组合。除了非常适合稀疏输入矩阵外,我们的方法在一定的问题参数范围内仍然具有最佳掉队者弹性。此外,与最近的稀疏矩阵计算方法相比,在我们的方法中,搜索一组“好”的随机系数来提高数值稳定性在计算上要高效得多。我们表明,我们的方法可以有效地利用异构系统中较慢的工作节点所做的部分计算,这可以提高整体计算速度。通过亚马逊网络服务(AWS)进行的数值实验表明,与现有方法相比,每个工作节点的计算时间减少了30%,编码速度加快了100倍。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
8.20
自引率
0.00%
发文量
0
期刊最新文献
Source Coding for Markov Sources With Partial Memoryless Side Information at the Decoder Deviation From Maximal Entanglement for Mid-Spectrum Eigenstates of Local Hamiltonians Statistical Inference With Limited Memory: A Survey Tightening Continuity Bounds for Entropies and Bounds on Quantum Capacities Dynamic Group Testing to Control and Monitor Disease Progression in a Population
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1