{"title":"Online Management for Edge-Cloud Collaborative Continuous Learning: A Two-Timescale Approach","authors":"Shaohui Lin;Xiaoxi Zhang;Yupeng Li;Carlee Joe-Wong;Jingpu Duan;Dongxiao Yu;Yu Wu;Xu Chen","doi":"10.1109/TMC.2024.3451715","DOIUrl":null,"url":null,"abstract":"Deep learning (DL) powered real-time applications usually need continuous training using data streams generated over time and across different geographical locations. Enabling data offloading among computation nodes through model training is promising to mitigate the problem that devices generating large datasets may have low computation capability. However, offloading can compromise model convergence and incur communication costs, which must be balanced with the long-term cost spent on computation and model synchronization. Therefore, this paper proposes EdgeC3, a novel framework that can optimize the frequency of model aggregation and dynamic offloading for continuously generated data streams, navigating the trade-off between long-term accuracy and cost. We first provide a new error bound to capture the impacts of data dynamics that are varying over time and heterogeneous across devices, as well as quantifying varied data heterogeneity between local models and the global one. Based on the bound, we design a two-timescale online optimization framework. We periodically learn the synchronization frequency to adapt with uncertain future offloading and network changes. In the finer timescale, we manage online offloading by extending Lyapunov optimization techniques to handle an unconventional setting, where our long-term global constraint can have abruptly changed aggregation frequencies that are decided in the longer timescale. Finally, we theoretically prove the convergence of EdgeC3 by integrating the coupled effects of our two-timescale decisions, and we demonstrate its advantage through extensive experiments performing distributed DL training for different domains.","PeriodicalId":50389,"journal":{"name":"IEEE Transactions on Mobile Computing","volume":null,"pages":null},"PeriodicalIF":7.7000,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Mobile Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10663344/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Deep learning (DL) powered real-time applications usually need continuous training using data streams generated over time and across different geographical locations. Enabling data offloading among computation nodes through model training is promising to mitigate the problem that devices generating large datasets may have low computation capability. However, offloading can compromise model convergence and incur communication costs, which must be balanced with the long-term cost spent on computation and model synchronization. Therefore, this paper proposes EdgeC3, a novel framework that can optimize the frequency of model aggregation and dynamic offloading for continuously generated data streams, navigating the trade-off between long-term accuracy and cost. We first provide a new error bound to capture the impacts of data dynamics that are varying over time and heterogeneous across devices, as well as quantifying varied data heterogeneity between local models and the global one. Based on the bound, we design a two-timescale online optimization framework. We periodically learn the synchronization frequency to adapt with uncertain future offloading and network changes. In the finer timescale, we manage online offloading by extending Lyapunov optimization techniques to handle an unconventional setting, where our long-term global constraint can have abruptly changed aggregation frequencies that are decided in the longer timescale. Finally, we theoretically prove the convergence of EdgeC3 by integrating the coupled effects of our two-timescale decisions, and we demonstrate its advantage through extensive experiments performing distributed DL training for different domains.
期刊介绍:
IEEE Transactions on Mobile Computing addresses key technical issues related to various aspects of mobile computing. This includes (a) architectures, (b) support services, (c) algorithm/protocol design and analysis, (d) mobile environments, (e) mobile communication systems, (f) applications, and (g) emerging technologies. Topics of interest span a wide range, covering aspects like mobile networks and hosts, mobility management, multimedia, operating system support, power management, online and mobile environments, security, scalability, reliability, and emerging technologies such as wearable computers, body area networks, and wireless sensor networks. The journal serves as a comprehensive platform for advancements in mobile computing research.