Non-Coherent Over-the-Air Decentralized Gradient Descent

IF 4.6 2区工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC IEEE Transactions on Signal Processing Pub Date : 2024-09-16 DOI:10.1109/TSP.2024.3460690

Nicolò Michelusi

{"title":"Non-Coherent Over-the-Air Decentralized Gradient Descent","authors":"Nicolò Michelusi","doi":"10.1109/TSP.2024.3460690","DOIUrl":null,"url":null,"abstract":"Implementing Decentralized Gradient Descent (DGD) in wireless systems is challenging due to noise, fading, and limited bandwidth, necessitating topology awareness, transmission scheduling, and the acquisition of channel state information (CSI) to mitigate interference and maintain reliable communications. These operations may result in substantial signaling overhead and scalability challenges in large networks lacking central coordination. This paper introduces a scalable DGD algorithm that eliminates the need for scheduling, topology information, or CSI (both average and instantaneous). At its core is a Non-Coherent Over-The-Air (NCOTA) consensus scheme that exploits a noisy energy superposition property of wireless channels. Nodes encode their local optimization signals into energy levels within an OFDM frame and transmit simultaneously, without coordination. The key insight is that the received energy equals, \n<italic>on average</i>\n, the sum of the energies of the transmitted signals, scaled by their respective average channel gains, akin to a consensus step. This property enables unbiased consensus estimation, utilizing average channel gains as mixing weights, thereby removing the need for their explicit design or for CSI. Introducing a consensus stepsize mitigates consensus estimation errors due to energy fluctuations around their expected values. For strongly-convex problems, it is shown that the expected squared distance between the local and globally optimum models vanishes at a rate of \n<inline-formula><tex-math>$\\mathcal{O}(1/\\sqrt{k})$</tex-math></inline-formula>\n after \n<inline-formula><tex-math>$k$</tex-math></inline-formula>\n iterations, with suitable decreasing learning and consensus stepsizes. Extensions accommodate a broad class of fading models and frequency-selective channels. Numerical experiments on image classification demonstrate faster convergence in terms of running time compared to state-of-the-art schemes, especially in dense network scenarios.","PeriodicalId":13330,"journal":{"name":"IEEE Transactions on Signal Processing","volume":"72 ","pages":"4618-4634"},"PeriodicalIF":4.6000,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Signal Processing","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10680589/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

Implementing Decentralized Gradient Descent (DGD) in wireless systems is challenging due to noise, fading, and limited bandwidth, necessitating topology awareness, transmission scheduling, and the acquisition of channel state information (CSI) to mitigate interference and maintain reliable communications. These operations may result in substantial signaling overhead and scalability challenges in large networks lacking central coordination. This paper introduces a scalable DGD algorithm that eliminates the need for scheduling, topology information, or CSI (both average and instantaneous). At its core is a Non-Coherent Over-The-Air (NCOTA) consensus scheme that exploits a noisy energy superposition property of wireless channels. Nodes encode their local optimization signals into energy levels within an OFDM frame and transmit simultaneously, without coordination. The key insight is that the received energy equals, on average , the sum of the energies of the transmitted signals, scaled by their respective average channel gains, akin to a consensus step. This property enables unbiased consensus estimation, utilizing average channel gains as mixing weights, thereby removing the need for their explicit design or for CSI. Introducing a consensus stepsize mitigates consensus estimation errors due to energy fluctuations around their expected values. For strongly-convex problems, it is shown that the expected squared distance between the local and globally optimum models vanishes at a rate of

$\mathcal{O}(1/\sqrt{k})$

after

$k$

iterations, with suitable decreasing learning and consensus stepsizes. Extensions accommodate a broad class of fading models and frequency-selective channels. Numerical experiments on image classification demonstrate faster convergence in terms of running time compared to state-of-the-art schemes, especially in dense network scenarios.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

非相干空中分散梯度下降

由于噪声、衰减和有限的带宽，在无线系统中实施分散梯度下降（DGD）具有挑战性，需要进行拓扑感知、传输调度和获取信道状态信息（CSI），以减轻干扰和保持可靠的通信。在缺乏中央协调的大型网络中，这些操作可能会导致大量的信令开销和可扩展性挑战。本文介绍了一种可扩展的 DGD 算法，该算法无需调度、拓扑信息或 CSI（平均值和瞬时值）。该算法的核心是一种非相干空中（NCOTA）共识方案，它利用了无线信道的噪声能量叠加特性。节点将其本地优化信号编码为 OFDM 帧内的能量水平，并同时发送，无需协调。其关键在于，接收到的能量平均等于发射信号的能量总和，并按各自的平均信道增益进行缩放，类似于一个共识步骤。利用平均信道增益作为混合权重，这一特性可实现无偏共识估计，从而无需明确设计或 CSI。引入共识步长可以减少由于预期值附近的能量波动造成的共识估计误差。对于强凸问题，研究表明，在适当的递减学习和共识步长条件下，局部最优模型和全局最优模型之间的预期平方距离会在迭代 $k$ 后以 $\mathcal{O}(1/\sqrt{k})$ 的速率消失。该方法的扩展适用于多种衰减模型和频率选择信道。图像分类的数值实验表明，与最先进的方案相比，该方案在运行时间上收敛更快，尤其是在密集网络场景中。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Transactions on Signal Processing 工程技术-工程：电子与电气

CiteScore

11.20

自引率

9.30%

发文量

310

审稿时长

3.0 months

期刊介绍： The IEEE Transactions on Signal Processing covers novel theory, algorithms, performance analyses and applications of techniques for the processing, understanding, learning, retrieval, mining, and extraction of information from signals. The term “signal” includes, among others, audio, video, speech, image, communication, geophysical, sonar, radar, medical and musical signals. Examples of topics of interest include, but are not limited to, information processing and the theory and application of filtering, coding, transmitting, estimating, detecting, analyzing, recognizing, synthesizing, recording, and reproducing signals.

期刊最新文献

Low-Tubal-Rank Tensor Recovery via Factorized Gradient Descent Data-Driven Quickest Change Detection in (Hidden) Markov Models Simplicial Vector Autoregressive Models A Directional Generation Algorithm for SAR Image based on Azimuth-Guided Statistical Generative Adversarial Network Structured Directional Pruning via Perturbation Orthogonal Projection