Anindya Bijoy Das, Aditya Ramamoorthy, David J. Love, Christopher G. Brinton
{"title":"Sparsity-Preserving Encodings for Straggler-Optimal Distributed Matrix Computations at the Edge","authors":"Anindya Bijoy Das, Aditya Ramamoorthy, David J. Love, Christopher G. Brinton","doi":"arxiv-2408.05152","DOIUrl":null,"url":null,"abstract":"Matrix computations are a fundamental building-block of edge computing\nsystems, with a major recent uptick in demand due to their use in AI/ML\ntraining and inference procedures. Existing approaches for distributing matrix\ncomputations involve allocating coded combinations of submatrices to worker\nnodes, to build resilience to slower nodes, called stragglers. In the edge\nlearning context, however, these approaches will compromise sparsity properties\nthat are often present in the original matrices found at the edge server. In\nthis study, we consider the challenge of augmenting such approaches to preserve\ninput sparsity when distributing the task across edge devices, thereby\nretaining the associated computational efficiency enhancements. First, we find\na lower bound on the weight of coding, i.e., the number of submatrices to be\ncombined to obtain coded submatrices, to provide the resilience to the maximum\npossible number of straggler devices (for given number of devices and their\nstorage constraints). Next we propose distributed matrix computation schemes\nwhich meet the exact lower bound on the weight of the coding. Numerical\nexperiments conducted in Amazon Web Services (AWS) validate our assertions\nregarding straggler mitigation and computation speed for sparse matrices.","PeriodicalId":501422,"journal":{"name":"arXiv - CS - Distributed, Parallel, and Cluster Computing","volume":"14 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Distributed, Parallel, and Cluster Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.05152","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Matrix computations are a fundamental building-block of edge computing
systems, with a major recent uptick in demand due to their use in AI/ML
training and inference procedures. Existing approaches for distributing matrix
computations involve allocating coded combinations of submatrices to worker
nodes, to build resilience to slower nodes, called stragglers. In the edge
learning context, however, these approaches will compromise sparsity properties
that are often present in the original matrices found at the edge server. In
this study, we consider the challenge of augmenting such approaches to preserve
input sparsity when distributing the task across edge devices, thereby
retaining the associated computational efficiency enhancements. First, we find
a lower bound on the weight of coding, i.e., the number of submatrices to be
combined to obtain coded submatrices, to provide the resilience to the maximum
possible number of straggler devices (for given number of devices and their
storage constraints). Next we propose distributed matrix computation schemes
which meet the exact lower bound on the weight of the coding. Numerical
experiments conducted in Amazon Web Services (AWS) validate our assertions
regarding straggler mitigation and computation speed for sparse matrices.