{"title":"Non-Coherent Over-the-Air Decentralized Gradient Descent","authors":"Nicolò Michelusi","doi":"10.1109/TSP.2024.3460690","DOIUrl":null,"url":null,"abstract":"Implementing Decentralized Gradient Descent (DGD) in wireless systems is challenging due to noise, fading, and limited bandwidth, necessitating topology awareness, transmission scheduling, and the acquisition of channel state information (CSI) to mitigate interference and maintain reliable communications. These operations may result in substantial signaling overhead and scalability challenges in large networks lacking central coordination. This paper introduces a scalable DGD algorithm that eliminates the need for scheduling, topology information, or CSI (both average and instantaneous). At its core is a Non-Coherent Over-The-Air (NCOTA) consensus scheme that exploits a noisy energy superposition property of wireless channels. Nodes encode their local optimization signals into energy levels within an OFDM frame and transmit simultaneously, without coordination. The key insight is that the received energy equals, \n<italic>on average</i>\n, the sum of the energies of the transmitted signals, scaled by their respective average channel gains, akin to a consensus step. This property enables unbiased consensus estimation, utilizing average channel gains as mixing weights, thereby removing the need for their explicit design or for CSI. Introducing a consensus stepsize mitigates consensus estimation errors due to energy fluctuations around their expected values. For strongly-convex problems, it is shown that the expected squared distance between the local and globally optimum models vanishes at a rate of \n<inline-formula><tex-math>$\\mathcal{O}(1/\\sqrt{k})$</tex-math></inline-formula>\n after \n<inline-formula><tex-math>$k$</tex-math></inline-formula>\n iterations, with suitable decreasing learning and consensus stepsizes. Extensions accommodate a broad class of fading models and frequency-selective channels. Numerical experiments on image classification demonstrate faster convergence in terms of running time compared to state-of-the-art schemes, especially in dense network scenarios.","PeriodicalId":13330,"journal":{"name":"IEEE Transactions on Signal Processing","volume":"72 ","pages":"4618-4634"},"PeriodicalIF":4.6000,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Signal Processing","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10680589/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
Implementing Decentralized Gradient Descent (DGD) in wireless systems is challenging due to noise, fading, and limited bandwidth, necessitating topology awareness, transmission scheduling, and the acquisition of channel state information (CSI) to mitigate interference and maintain reliable communications. These operations may result in substantial signaling overhead and scalability challenges in large networks lacking central coordination. This paper introduces a scalable DGD algorithm that eliminates the need for scheduling, topology information, or CSI (both average and instantaneous). At its core is a Non-Coherent Over-The-Air (NCOTA) consensus scheme that exploits a noisy energy superposition property of wireless channels. Nodes encode their local optimization signals into energy levels within an OFDM frame and transmit simultaneously, without coordination. The key insight is that the received energy equals,
on average
, the sum of the energies of the transmitted signals, scaled by their respective average channel gains, akin to a consensus step. This property enables unbiased consensus estimation, utilizing average channel gains as mixing weights, thereby removing the need for their explicit design or for CSI. Introducing a consensus stepsize mitigates consensus estimation errors due to energy fluctuations around their expected values. For strongly-convex problems, it is shown that the expected squared distance between the local and globally optimum models vanishes at a rate of
$\mathcal{O}(1/\sqrt{k})$
after
$k$
iterations, with suitable decreasing learning and consensus stepsizes. Extensions accommodate a broad class of fading models and frequency-selective channels. Numerical experiments on image classification demonstrate faster convergence in terms of running time compared to state-of-the-art schemes, especially in dense network scenarios.
期刊介绍:
The IEEE Transactions on Signal Processing covers novel theory, algorithms, performance analyses and applications of techniques for the processing, understanding, learning, retrieval, mining, and extraction of information from signals. The term “signal” includes, among others, audio, video, speech, image, communication, geophysical, sonar, radar, medical and musical signals. Examples of topics of interest include, but are not limited to, information processing and the theory and application of filtering, coding, transmitting, estimating, detecting, analyzing, recognizing, synthesizing, recording, and reproducing signals.