{"title":"使用SGD在网络上进行Nyström近似的分散学习","authors":"Heng Lian , Jiamin Liu","doi":"10.1016/j.acha.2023.06.005","DOIUrl":null,"url":null,"abstract":"<div><p>Nowadays we often meet with a learning problem when data are distributed on different machines connected via a network, instead of stored centrally. Here we consider decentralized supervised learning in a reproducing kernel Hilbert space<span>. We note that standard gradient descent in a reproducing kernel Hilbert space is difficult to implement with multiple communications between worker machines. On the other hand, the Nyström approximation using gradient descent is more suited for the decentralized setting since only a small number of data points need to be shared at the beginning of the algorithm. In the setting of decentralized distributed learning in a reproducing kernel Hilbert space, we establish the optimal learning rate of stochastic gradient descent based on mini-batches, allowing multiple passes over the data set. The proposal provides a scalable approach to nonparametric estimation combining gradient method, distributed estimation, and random projection.</span></p></div>","PeriodicalId":55504,"journal":{"name":"Applied and Computational Harmonic Analysis","volume":"66 ","pages":"Pages 373-387"},"PeriodicalIF":2.6000,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Decentralized learning over a network with Nyström approximation using SGD\",\"authors\":\"Heng Lian , Jiamin Liu\",\"doi\":\"10.1016/j.acha.2023.06.005\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Nowadays we often meet with a learning problem when data are distributed on different machines connected via a network, instead of stored centrally. Here we consider decentralized supervised learning in a reproducing kernel Hilbert space<span>. We note that standard gradient descent in a reproducing kernel Hilbert space is difficult to implement with multiple communications between worker machines. On the other hand, the Nyström approximation using gradient descent is more suited for the decentralized setting since only a small number of data points need to be shared at the beginning of the algorithm. In the setting of decentralized distributed learning in a reproducing kernel Hilbert space, we establish the optimal learning rate of stochastic gradient descent based on mini-batches, allowing multiple passes over the data set. The proposal provides a scalable approach to nonparametric estimation combining gradient method, distributed estimation, and random projection.</span></p></div>\",\"PeriodicalId\":55504,\"journal\":{\"name\":\"Applied and Computational Harmonic Analysis\",\"volume\":\"66 \",\"pages\":\"Pages 373-387\"},\"PeriodicalIF\":2.6000,\"publicationDate\":\"2023-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Applied and Computational Harmonic Analysis\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1063520323000490\",\"RegionNum\":2,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"MATHEMATICS, APPLIED\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied and Computational Harmonic Analysis","FirstCategoryId":"100","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1063520323000490","RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATHEMATICS, APPLIED","Score":null,"Total":0}
Decentralized learning over a network with Nyström approximation using SGD
Nowadays we often meet with a learning problem when data are distributed on different machines connected via a network, instead of stored centrally. Here we consider decentralized supervised learning in a reproducing kernel Hilbert space. We note that standard gradient descent in a reproducing kernel Hilbert space is difficult to implement with multiple communications between worker machines. On the other hand, the Nyström approximation using gradient descent is more suited for the decentralized setting since only a small number of data points need to be shared at the beginning of the algorithm. In the setting of decentralized distributed learning in a reproducing kernel Hilbert space, we establish the optimal learning rate of stochastic gradient descent based on mini-batches, allowing multiple passes over the data set. The proposal provides a scalable approach to nonparametric estimation combining gradient method, distributed estimation, and random projection.
期刊介绍:
Applied and Computational Harmonic Analysis (ACHA) is an interdisciplinary journal that publishes high-quality papers in all areas of mathematical sciences related to the applied and computational aspects of harmonic analysis, with special emphasis on innovative theoretical development, methods, and algorithms, for information processing, manipulation, understanding, and so forth. The objectives of the journal are to chronicle the important publications in the rapidly growing field of data representation and analysis, to stimulate research in relevant interdisciplinary areas, and to provide a common link among mathematical, physical, and life scientists, as well as engineers.