R. Lu, Jiajun Song, B. Chen, Laizhong Cui, Zhi Wang
{"title":"DAGC: Data-Aware Adaptive Gradient Compression","authors":"R. Lu, Jiajun Song, B. Chen, Laizhong Cui, Zhi Wang","doi":"10.1109/INFOCOM53939.2023.10229053","DOIUrl":null,"url":null,"abstract":"Gradient compression algorithms are widely used to alleviate the communication bottleneck in distributed ML. However, existing gradient compression algorithms suffer from accuracy degradation in Non-IID scenarios, because a uniform compression scheme is used to compress gradients at workers with different data distributions and volumes, since workers with larger volumes of data are forced to adapt to the same aggressive compression ratios as others. Assigning different compression ratios to workers with different data distributions and volumes is thus a promising solution. In this study, we first derive a function from capturing the correlation between the number of training iterations for a model to converge to the same accuracy, and the compression ratios at different workers; This function particularly shows that workers with larger data volumes should be assigned with higher compression ratios1 to guarantee better accuracy. Then, we formulate the assignment of compression ratios to the workers as an n-variables chi-square nonlinear optimization problem under fixed and limited total communication constrain. We propose an adaptive gradient compression strategy called DAGC, which assigns each worker a different compression ratio according to their data volumes. Our experiments confirm that DAGC can achieve better performance facing highly imbalanced data volume distribution and restricted communication.","PeriodicalId":387707,"journal":{"name":"IEEE INFOCOM 2023 - IEEE Conference on Computer Communications","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE INFOCOM 2023 - IEEE Conference on Computer Communications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/INFOCOM53939.2023.10229053","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Gradient compression algorithms are widely used to alleviate the communication bottleneck in distributed ML. However, existing gradient compression algorithms suffer from accuracy degradation in Non-IID scenarios, because a uniform compression scheme is used to compress gradients at workers with different data distributions and volumes, since workers with larger volumes of data are forced to adapt to the same aggressive compression ratios as others. Assigning different compression ratios to workers with different data distributions and volumes is thus a promising solution. In this study, we first derive a function from capturing the correlation between the number of training iterations for a model to converge to the same accuracy, and the compression ratios at different workers; This function particularly shows that workers with larger data volumes should be assigned with higher compression ratios1 to guarantee better accuracy. Then, we formulate the assignment of compression ratios to the workers as an n-variables chi-square nonlinear optimization problem under fixed and limited total communication constrain. We propose an adaptive gradient compression strategy called DAGC, which assigns each worker a different compression ratio according to their data volumes. Our experiments confirm that DAGC can achieve better performance facing highly imbalanced data volume distribution and restricted communication.