SGD 中的自适应 Top-K，实现多机器人协作中高通信效率的分布式学习

IF 8.7 1区工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC IEEE Journal of Selected Topics in Signal Processing Pub Date : 2024-04-05 DOI:10.1109/JSTSP.2024.3381373

Mengzhe Ruan;Guangfeng Yan;Yuanzhang Xiao;Linqi Song;Weitao Xu

{"title":"SGD 中的自适应 Top-K，实现多机器人协作中高通信效率的分布式学习","authors":"Mengzhe Ruan;Guangfeng Yan;Yuanzhang Xiao;Linqi Song;Weitao Xu","doi":"10.1109/JSTSP.2024.3381373","DOIUrl":null,"url":null,"abstract":"Distributed stochastic gradient descent (D-SGD) with gradient compression has become a popular communication-efficient solution for accelerating optimization procedures in distributed learning systems like multi-robot systems. One commonly used method for gradient compression is Top-K sparsification, which sparsifies the gradients by a fixed degree during model training. However, there has been a lack of an adaptive approach with a systematic treatment and analysis to adjust the sparsification degree to maximize the potential of the model's performance or training speed. This paper proposes a novel adaptive Top-K in Stochastic Gradient Descent framework that enables an adaptive degree of sparsification for each gradient descent step to optimize the convergence performance by balancing the trade-off between communication cost and convergence error with respect to the norm of gradients and the communication budget. Firstly, an upper bound of convergence error is derived for the adaptive sparsification scheme and the loss function. Secondly, we consider communication budget constraints and propose an optimization formulation for minimizing the deep model's convergence error under such constraints. We obtain an enhanced compression algorithm that significantly improves model accuracy under given communication budget constraints. Finally, we conduct numerical experiments on general image classification tasks using the MNIST, CIFAR-10 datasets. For the multi-robot collaboration tasks, we choose the object detection task on the PASCAL VOC dataset. The results demonstrate that the proposed adaptive Top-K algorithm in SGD achieves a significantly better convergence rate compared to state-of-the-art methods, even after considering error compensation.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"18 3","pages":"487-501"},"PeriodicalIF":8.7000,"publicationDate":"2024-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Adaptive Top-K in SGD for Communication-Efficient Distributed Learning in Multi-Robot Collaboration\",\"authors\":\"Mengzhe Ruan;Guangfeng Yan;Yuanzhang Xiao;Linqi Song;Weitao Xu\",\"doi\":\"10.1109/JSTSP.2024.3381373\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Distributed stochastic gradient descent (D-SGD) with gradient compression has become a popular communication-efficient solution for accelerating optimization procedures in distributed learning systems like multi-robot systems. One commonly used method for gradient compression is Top-K sparsification, which sparsifies the gradients by a fixed degree during model training. However, there has been a lack of an adaptive approach with a systematic treatment and analysis to adjust the sparsification degree to maximize the potential of the model's performance or training speed. This paper proposes a novel adaptive Top-K in Stochastic Gradient Descent framework that enables an adaptive degree of sparsification for each gradient descent step to optimize the convergence performance by balancing the trade-off between communication cost and convergence error with respect to the norm of gradients and the communication budget. Firstly, an upper bound of convergence error is derived for the adaptive sparsification scheme and the loss function. Secondly, we consider communication budget constraints and propose an optimization formulation for minimizing the deep model's convergence error under such constraints. We obtain an enhanced compression algorithm that significantly improves model accuracy under given communication budget constraints. Finally, we conduct numerical experiments on general image classification tasks using the MNIST, CIFAR-10 datasets. For the multi-robot collaboration tasks, we choose the object detection task on the PASCAL VOC dataset. The results demonstrate that the proposed adaptive Top-K algorithm in SGD achieves a significantly better convergence rate compared to state-of-the-art methods, even after considering error compensation.\",\"PeriodicalId\":13038,\"journal\":{\"name\":\"IEEE Journal of Selected Topics in Signal Processing\",\"volume\":\"18 3\",\"pages\":\"487-501\"},\"PeriodicalIF\":8.7000,\"publicationDate\":\"2024-04-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Journal of Selected Topics in Signal Processing\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10493123/\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Journal of Selected Topics in Signal Processing","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10493123/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

摘要

带有梯度压缩功能的分布式随机梯度下降（D-SGD）已成为在多机器人系统等分布式学习系统中加速优化程序的一种流行的通信高效解决方案。一种常用的梯度压缩方法是 Top-K sparsification，该方法在模型训练过程中按固定程度稀疏梯度。然而，目前还缺乏一种系统处理和分析的自适应方法来调整稀疏程度，以最大限度地发挥模型的性能潜力或训练速度。本文在随机梯度下降框架中提出了一种新的自适应 Top-K，通过平衡通信成本和收敛误差与梯度准则和通信预算之间的权衡，为每个梯度下降步骤提供自适应的稀疏化程度，以优化收敛性能。首先，针对自适应稀疏化方案和损失函数推导出收敛误差的上限。其次，我们考虑了通信预算约束，并提出了在这种约束下最小化深度模型收敛误差的优化方案。我们得到了一种增强型压缩算法，它能在给定的通信预算约束条件下显著提高模型精度。最后，我们使用 MNIST 和 CIFAR-10 数据集对一般图像分类任务进行了数值实验。对于多机器人协作任务，我们选择了 PASCAL VOC 数据集上的物体检测任务。结果表明，与最先进的方法相比，即使考虑了误差补偿，SGD 中提出的自适应 Top-K 算法的收敛率也明显更高。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Adaptive Top-K in SGD for Communication-Efficient Distributed Learning in Multi-Robot Collaboration

Distributed stochastic gradient descent (D-SGD) with gradient compression has become a popular communication-efficient solution for accelerating optimization procedures in distributed learning systems like multi-robot systems. One commonly used method for gradient compression is Top-K sparsification, which sparsifies the gradients by a fixed degree during model training. However, there has been a lack of an adaptive approach with a systematic treatment and analysis to adjust the sparsification degree to maximize the potential of the model's performance or training speed. This paper proposes a novel adaptive Top-K in Stochastic Gradient Descent framework that enables an adaptive degree of sparsification for each gradient descent step to optimize the convergence performance by balancing the trade-off between communication cost and convergence error with respect to the norm of gradients and the communication budget. Firstly, an upper bound of convergence error is derived for the adaptive sparsification scheme and the loss function. Secondly, we consider communication budget constraints and propose an optimization formulation for minimizing the deep model's convergence error under such constraints. We obtain an enhanced compression algorithm that significantly improves model accuracy under given communication budget constraints. Finally, we conduct numerical experiments on general image classification tasks using the MNIST, CIFAR-10 datasets. For the multi-robot collaboration tasks, we choose the object detection task on the PASCAL VOC dataset. The results demonstrate that the proposed adaptive Top-K algorithm in SGD achieves a significantly better convergence rate compared to state-of-the-art methods, even after considering error compensation.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Journal of Selected Topics in Signal Processing 工程技术-工程：电子与电气

CiteScore

19.00

自引率

1.30%

发文量

135

审稿时长

3 months

期刊介绍： The IEEE Journal of Selected Topics in Signal Processing (JSTSP) focuses on the Field of Interest of the IEEE Signal Processing Society, which encompasses the theory and application of various signal processing techniques. These techniques include filtering, coding, transmitting, estimating, detecting, analyzing, recognizing, synthesizing, recording, and reproducing signals using digital or analog devices. The term "signal" covers a wide range of data types, including audio, video, speech, image, communication, geophysical, sonar, radar, medical, musical, and others. The journal format allows for in-depth exploration of signal processing topics, enabling the Society to cover both established and emerging areas. This includes interdisciplinary fields such as biomedical engineering and language processing, as well as areas not traditionally associated with engineering.

期刊最新文献

Front Cover Table of Contents IEEE Signal Processing Society Information Introduction to the Special Issue Near-Field Signal Processing: Algorithms, Implementations and Applications IEEE Signal Processing Society Information