使用纠错码的无服务器离散器缓解

Vipul Gupta, Dominic Carrano, Yaoqing Yang, Vaishaal Shankar, T. Courtade, K. Ramchandran
{"title":"使用纠错码的无服务器离散器缓解","authors":"Vipul Gupta, Dominic Carrano, Yaoqing Yang, Vaishaal Shankar, T. Courtade, K. Ramchandran","doi":"10.1109/ICDCS47774.2020.00019","DOIUrl":null,"url":null,"abstract":"Inexpensive cloud services, such as serverless computing, are often vulnerable to straggling nodes that increase the end-to-end latency for distributed computation. We propose and implement simple yet principled approaches for straggler mitigation in serverless systems for matrix multiplication and evaluate them on several common applications from machine learning and high-performance computing. The proposed schemes are inspired by error-correcting codes and employ parallel encoding and decoding over the data stored in the cloud using serverless workers. This creates a fully distributed computing framework without using a master node to conduct encoding or decoding, which removes the computation, communication and storage bottleneck at the master. On the theory side, we establish that our proposed scheme is asymptotically optimal in terms of decoding time and provide a lower bound on the number of stragglers it can tolerate with high probability. Through extensive experiments, we show that our scheme outperforms existing schemes such as speculative execution and other coding theoretic methods by at least 25%.","PeriodicalId":158630,"journal":{"name":"2020 IEEE 40th International Conference on Distributed Computing Systems (ICDCS)","volume":"1961 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Serverless Straggler Mitigation using Error-Correcting Codes\",\"authors\":\"Vipul Gupta, Dominic Carrano, Yaoqing Yang, Vaishaal Shankar, T. Courtade, K. Ramchandran\",\"doi\":\"10.1109/ICDCS47774.2020.00019\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Inexpensive cloud services, such as serverless computing, are often vulnerable to straggling nodes that increase the end-to-end latency for distributed computation. We propose and implement simple yet principled approaches for straggler mitigation in serverless systems for matrix multiplication and evaluate them on several common applications from machine learning and high-performance computing. The proposed schemes are inspired by error-correcting codes and employ parallel encoding and decoding over the data stored in the cloud using serverless workers. This creates a fully distributed computing framework without using a master node to conduct encoding or decoding, which removes the computation, communication and storage bottleneck at the master. On the theory side, we establish that our proposed scheme is asymptotically optimal in terms of decoding time and provide a lower bound on the number of stragglers it can tolerate with high probability. Through extensive experiments, we show that our scheme outperforms existing schemes such as speculative execution and other coding theoretic methods by at least 25%.\",\"PeriodicalId\":158630,\"journal\":{\"name\":\"2020 IEEE 40th International Conference on Distributed Computing Systems (ICDCS)\",\"volume\":\"1961 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE 40th International Conference on Distributed Computing Systems (ICDCS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDCS47774.2020.00019\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE 40th International Conference on Distributed Computing Systems (ICDCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDCS47774.2020.00019","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

摘要

廉价的云服务(如无服务器计算)通常容易受到分散节点的影响,这会增加分布式计算的端到端延迟。我们提出并实现了用于矩阵乘法的无服务器系统中的离散缓解的简单而有原则的方法,并在机器学习和高性能计算的几个常见应用中对它们进行了评估。所提出的方案受到纠错码的启发,并使用无服务器工作器对存储在云中的数据进行并行编码和解码。这创建了一个完全分布式的计算框架,而无需使用主节点进行编码或解码,从而消除了主节点的计算、通信和存储瓶颈。在理论方面,我们证明了我们所提出的方案在解码时间方面是渐近最优的,并提供了它可以高概率容忍的离散数的下界。通过大量的实验,我们表明我们的方案比现有的方案(如推测执行和其他编码理论方法)至少高出25%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Serverless Straggler Mitigation using Error-Correcting Codes
Inexpensive cloud services, such as serverless computing, are often vulnerable to straggling nodes that increase the end-to-end latency for distributed computation. We propose and implement simple yet principled approaches for straggler mitigation in serverless systems for matrix multiplication and evaluate them on several common applications from machine learning and high-performance computing. The proposed schemes are inspired by error-correcting codes and employ parallel encoding and decoding over the data stored in the cloud using serverless workers. This creates a fully distributed computing framework without using a master node to conduct encoding or decoding, which removes the computation, communication and storage bottleneck at the master. On the theory side, we establish that our proposed scheme is asymptotically optimal in terms of decoding time and provide a lower bound on the number of stragglers it can tolerate with high probability. Through extensive experiments, we show that our scheme outperforms existing schemes such as speculative execution and other coding theoretic methods by at least 25%.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
An Energy-Efficient Edge Offloading Scheme for UAV-Assisted Internet of Things Kill Two Birds with One Stone: Auto-tuning RocksDB for High Bandwidth and Low Latency BlueFi: Physical-layer Cross-Technology Communication from Bluetooth to WiFi [Title page i] Distributionally Robust Edge Learning with Dirichlet Process Prior
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1