Pub Date : 2022-07-01DOI: 10.1109/ICDCS54860.2022.00015
Tian Xie, Sanchal Thakkar, Ting He, P. Mcdaniel, Quinn K. Burke
In-network caching and flexible routing are two of the most celebrated advantages of next generation network infrastructures. Yet few solutions are available for jointly optimizing caching and routing that provide performance guarantees for an arbitrary topology. We take a holistic approach towards this fundamental problem by analyzing its complexity in all the cases and developing polynomial-time algorithms with approximation guarantees in important special cases. We also reveal the fundamental challenge in achieving guaranteed approximation in the general case and propose an alternating optimization algorithm with good performance and fast convergence. Our algorithms have demonstrated superior performance in both routing cost and congestion compared to the state-of-the-art solutions in evaluations based on real topology and request traces.
{"title":"Joint Caching and Routing in Cache Networks with Arbitrary Topology","authors":"Tian Xie, Sanchal Thakkar, Ting He, P. Mcdaniel, Quinn K. Burke","doi":"10.1109/ICDCS54860.2022.00015","DOIUrl":"https://doi.org/10.1109/ICDCS54860.2022.00015","url":null,"abstract":"In-network caching and flexible routing are two of the most celebrated advantages of next generation network infrastructures. Yet few solutions are available for jointly optimizing caching and routing that provide performance guarantees for an arbitrary topology. We take a holistic approach towards this fundamental problem by analyzing its complexity in all the cases and developing polynomial-time algorithms with approximation guarantees in important special cases. We also reveal the fundamental challenge in achieving guaranteed approximation in the general case and propose an alternating optimization algorithm with good performance and fast convergence. Our algorithms have demonstrated superior performance in both routing cost and congestion compared to the state-of-the-art solutions in evaluations based on real topology and request traces.","PeriodicalId":225883,"journal":{"name":"2022 IEEE 42nd International Conference on Distributed Computing Systems (ICDCS)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128721142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-01DOI: 10.1109/cgo.2013.6494974
S. Nirenburg, T. Oates
Provides a listing of current committee members.
提供当前委员会成员的列表。
{"title":"Organizing committee","authors":"S. Nirenburg, T. Oates","doi":"10.1109/cgo.2013.6494974","DOIUrl":"https://doi.org/10.1109/cgo.2013.6494974","url":null,"abstract":"Provides a listing of current committee members.","PeriodicalId":225883,"journal":{"name":"2022 IEEE 42nd International Conference on Distributed Computing Systems (ICDCS)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126401811","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-01DOI: 10.1109/ICDCS54860.2022.00093
Hao Chen, Yuan Yang, Mingwei Xu, Yuxuan Zhang, Chenyi Liu
IPv6 has shown notable growth in recent years, imposing the need for high-speed IPv6 lookup. As the forwarding rate of virtual switches continues increasing, software-based IPv6 lookup without using special hardware such as TCAM, GPU, and FPGA is of academic interest and industrial importance. Existing studies achieve fast software IPv4 lookup by reducing the operation number, as well as reducing the memory footprint so as to benefit from CPU cache. However, in the situation of 128-bit IPv6 addresses, it is challenging to keep both operation numbers and memory footprints small. To address the issue, we propose the Neurotrie data structure, which supports fast lookup and arbitrary strides. Thus, a good balance can be made between trie depth and memory footprint by computing the proper stride for each Neurotrie node. We model the optimal Neurotrie problem which minimizes the depth with limited memory footprint and develop a pseudo-polynomial time baseline algorithm to construct Neurotrie using dynamic programming. To improve the performance and reduce the computation complexity, we develop a deep reinforcement learning-based approach, which leverages a deep neural network to construct Neurotrie efficiently, based on characteristics captured from real IPv6 prefixes. We further refine the data structure and develop an efficient mechanism for routing updates. Experiments on real routing tables show that Neurotrie achieves a lookup rate 34% higher than that of state-of-the-art approaches.
{"title":"Neurotrie: Deep Reinforcement Learning-based Fast Software IPv6 Lookup","authors":"Hao Chen, Yuan Yang, Mingwei Xu, Yuxuan Zhang, Chenyi Liu","doi":"10.1109/ICDCS54860.2022.00093","DOIUrl":"https://doi.org/10.1109/ICDCS54860.2022.00093","url":null,"abstract":"IPv6 has shown notable growth in recent years, imposing the need for high-speed IPv6 lookup. As the forwarding rate of virtual switches continues increasing, software-based IPv6 lookup without using special hardware such as TCAM, GPU, and FPGA is of academic interest and industrial importance. Existing studies achieve fast software IPv4 lookup by reducing the operation number, as well as reducing the memory footprint so as to benefit from CPU cache. However, in the situation of 128-bit IPv6 addresses, it is challenging to keep both operation numbers and memory footprints small. To address the issue, we propose the Neurotrie data structure, which supports fast lookup and arbitrary strides. Thus, a good balance can be made between trie depth and memory footprint by computing the proper stride for each Neurotrie node. We model the optimal Neurotrie problem which minimizes the depth with limited memory footprint and develop a pseudo-polynomial time baseline algorithm to construct Neurotrie using dynamic programming. To improve the performance and reduce the computation complexity, we develop a deep reinforcement learning-based approach, which leverages a deep neural network to construct Neurotrie efficiently, based on characteristics captured from real IPv6 prefixes. We further refine the data structure and develop an efficient mechanism for routing updates. Experiments on real routing tables show that Neurotrie achieves a lookup rate 34% higher than that of state-of-the-art approaches.","PeriodicalId":225883,"journal":{"name":"2022 IEEE 42nd International Conference on Distributed Computing Systems (ICDCS)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131458104","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-01DOI: 10.1109/ICDCS54860.2022.00126
Giacomo Iadarola, F. Mercaldo, Fabio Martinelli, A. Santone
Deep Learning models demonstrated high accuracies performance in malware classification, but they are still lacking "explainability" to ensure robustness and reliability in the generated prediction. In this short contribution, we summarize the researches that we conducted in the latest years in the Malware Analysis field.
{"title":"Designing Robust Deep Learning Classifiers for Image-based Malware Analysis","authors":"Giacomo Iadarola, F. Mercaldo, Fabio Martinelli, A. Santone","doi":"10.1109/ICDCS54860.2022.00126","DOIUrl":"https://doi.org/10.1109/ICDCS54860.2022.00126","url":null,"abstract":"Deep Learning models demonstrated high accuracies performance in malware classification, but they are still lacking \"explainability\" to ensure robustness and reliability in the generated prediction. In this short contribution, we summarize the researches that we conducted in the latest years in the Malware Analysis field.","PeriodicalId":225883,"journal":{"name":"2022 IEEE 42nd International Conference on Distributed Computing Systems (ICDCS)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132105430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-01DOI: 10.1109/ICDCS54860.2022.00087
Lixiang Lin, Shenghao Qiu, Ziqi Yu, Liang You, Long Xin, Xiaoyang Sun, J. Xu, Zheng Wang
There is a growing interest in training deep neural networks (DNNs) in a GPU cloud environment. This is typically achieved by running parallel training workers on multiple GPUs across computing nodes. Under such a setup, the communication overhead is often responsible for long training time and poor scalability. This paper presents AIACC-Training, a unified communication framework designed for the distributed training of DNNs in a GPU cloud environment. AIACC-Training permits a training worker to participate in multiple gradient communication operations simultaneously to improve network bandwidth utilization and reduce communication latency. It employs auto-tuning techniques to dynamically determine the right communication parameters based on the input DNN workloads and the underlying network infrastructure. AIACC-Training has been deployed to production at Alibaba GPU Cloud with 3000+ GPUs executing AIACC-Training optimized code at any time. Experiments performed on representative DNN workloads show that AIACC-Training outperforms existing solutions, improving the training throughput and scalability by a large margin.
{"title":"AIACC-Training: Optimizing Distributed Deep Learning Training through Multi-streamed and Concurrent Gradient Communications","authors":"Lixiang Lin, Shenghao Qiu, Ziqi Yu, Liang You, Long Xin, Xiaoyang Sun, J. Xu, Zheng Wang","doi":"10.1109/ICDCS54860.2022.00087","DOIUrl":"https://doi.org/10.1109/ICDCS54860.2022.00087","url":null,"abstract":"There is a growing interest in training deep neural networks (DNNs) in a GPU cloud environment. This is typically achieved by running parallel training workers on multiple GPUs across computing nodes. Under such a setup, the communication overhead is often responsible for long training time and poor scalability. This paper presents AIACC-Training, a unified communication framework designed for the distributed training of DNNs in a GPU cloud environment. AIACC-Training permits a training worker to participate in multiple gradient communication operations simultaneously to improve network bandwidth utilization and reduce communication latency. It employs auto-tuning techniques to dynamically determine the right communication parameters based on the input DNN workloads and the underlying network infrastructure. AIACC-Training has been deployed to production at Alibaba GPU Cloud with 3000+ GPUs executing AIACC-Training optimized code at any time. Experiments performed on representative DNN workloads show that AIACC-Training outperforms existing solutions, improving the training throughput and scalability by a large margin.","PeriodicalId":225883,"journal":{"name":"2022 IEEE 42nd International Conference on Distributed Computing Systems (ICDCS)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132928287","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Serverless Computing and Function-as-a-Service (FaaS) offer convenient and transparent services to developers and users. The deployment and resource allocation of services are managed by the cloud service providers. Meanwhile, the development of smart mobile devices and network technology enables the collection and transmission of a huge amount of data, which creates the mobile edge computing shifting tasks to the network edge for mobile users. In this paper, we propose a deviceless edge computing system targeting the mobility of end users. We focus on the migration of virtual functions to provide uninterrupted services to mobile users. We introduce the deviceless edge computing model and propose a seamless migration scheme of virtual functions with limited involvement of function developers. We formulate the migration decision problem into integer linear programming and use receding horizon control (RHC) for online solutions. We implement the migration system and algorithm to support delay-sensitive scenarios over real edge devices and develop a streaming game as the virtual function to test the performance. Extensive experiments in real scenarios exhibit the system has the ability to support high-mobility and delay-sensitive application scenarios. Extensive simulation results also show its applicability over large-scale networks.
{"title":"Mobility-aware Seamless Virtual Function Migration in Deviceless Edge Computing Environments","authors":"Yaodong Huang, Zelin Lin, Tingting Yao, Xiaojun Shang, Laizhong Cui, J. Huang","doi":"10.1109/ICDCS54860.2022.00050","DOIUrl":"https://doi.org/10.1109/ICDCS54860.2022.00050","url":null,"abstract":"Serverless Computing and Function-as-a-Service (FaaS) offer convenient and transparent services to developers and users. The deployment and resource allocation of services are managed by the cloud service providers. Meanwhile, the development of smart mobile devices and network technology enables the collection and transmission of a huge amount of data, which creates the mobile edge computing shifting tasks to the network edge for mobile users. In this paper, we propose a deviceless edge computing system targeting the mobility of end users. We focus on the migration of virtual functions to provide uninterrupted services to mobile users. We introduce the deviceless edge computing model and propose a seamless migration scheme of virtual functions with limited involvement of function developers. We formulate the migration decision problem into integer linear programming and use receding horizon control (RHC) for online solutions. We implement the migration system and algorithm to support delay-sensitive scenarios over real edge devices and develop a streaming game as the virtual function to test the performance. Extensive experiments in real scenarios exhibit the system has the ability to support high-mobility and delay-sensitive application scenarios. Extensive simulation results also show its applicability over large-scale networks.","PeriodicalId":225883,"journal":{"name":"2022 IEEE 42nd International Conference on Distributed Computing Systems (ICDCS)","volume":"90 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133958780","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Geo-replication is essential in reliable large-scale cloud applications. We argue that existing replication solutions are too rigid to support today’s diversity of data consistency and performance requirements. Stabilizer is a flexible geo-replication library, supporting user-defined consistency models. The library achieves high performance using control-plane / data-plane separation: control events do not disrupt data flow. Our API offers simple control-plane operators that allow an application to define its desired consistency model: a stability frontier predicate. We build a wide-area K/V store with Stabilizer, a Dropbox-like application, and a prototype pub/sub system to show its versatility and evaluate its performance. When compared with a Paxos-based consistency protocol in an emulated Amazon EC2 wide-area network, experiments show that for a scenario requiring a more accurate consistency model, Stabilizer achieves a 24.75% latency performance improvement. Compared to Apache Pulsar in a real WAN environment, Stabilizer’s dynamic reconfiguration mechanism improves the pub/sub system performance significantly according to our experiment results.
{"title":"Stabilizer: Geo-Replication with User-defined Consistency","authors":"Pengze Li, Lichen Pan, Xinzhe Yang, Weijia Song, Zhen Xiao, K. Birman","doi":"10.1109/ICDCS54860.2022.00042","DOIUrl":"https://doi.org/10.1109/ICDCS54860.2022.00042","url":null,"abstract":"Geo-replication is essential in reliable large-scale cloud applications. We argue that existing replication solutions are too rigid to support today’s diversity of data consistency and performance requirements. Stabilizer is a flexible geo-replication library, supporting user-defined consistency models. The library achieves high performance using control-plane / data-plane separation: control events do not disrupt data flow. Our API offers simple control-plane operators that allow an application to define its desired consistency model: a stability frontier predicate. We build a wide-area K/V store with Stabilizer, a Dropbox-like application, and a prototype pub/sub system to show its versatility and evaluate its performance. When compared with a Paxos-based consistency protocol in an emulated Amazon EC2 wide-area network, experiments show that for a scenario requiring a more accurate consistency model, Stabilizer achieves a 24.75% latency performance improvement. Compared to Apache Pulsar in a real WAN environment, Stabilizer’s dynamic reconfiguration mechanism improves the pub/sub system performance significantly according to our experiment results.","PeriodicalId":225883,"journal":{"name":"2022 IEEE 42nd International Conference on Distributed Computing Systems (ICDCS)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128429919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-01DOI: 10.1109/ICDCS54860.2022.00038
Greg Cusack, Maziyar Nazari, Sepideh Goodarzy, Erika Hunhoff, Prerit Oberai, Eric Keller, Eric Rozner, Richard Han
This paper pushes the limits of automated resource allocation in container environments. Recent works set container CPU and memory limits by automatically scaling containers based on past resource usage. However, these systems are heavy- weight and run on coarse-grained time scales, resulting in poor performance when predictions are incorrect. We propose Escra, a container orchestrator that enables fine-grained, event- based resource allocation for a single container and distributed resource allocation to manage a collection of containers. Escra performs resource allocation on sub-second intervals within and across hosts, allowing operators to cost-effectively scale resources without performance penalty. We evaluate Escra on two types of containerized applications: microservices and serverless functions. In microservice environments, fine-grained and event- based resource allocation can reduce application latency by up to 96.9% and increase throughput by up to 3.2x when compared against the current state-of-the-art. Escra can increase performance while simultaneously reducing 50th and 99th%ile CPU waste by over 10x and 3.2x, respectively. In serverless environments, Escra can reduce CPU reservations by over 2.1x and memory reservations by more than 2x while maintaining similar end-to-end performance.
{"title":"Escra: Event-driven, Sub-second Container Resource Allocation","authors":"Greg Cusack, Maziyar Nazari, Sepideh Goodarzy, Erika Hunhoff, Prerit Oberai, Eric Keller, Eric Rozner, Richard Han","doi":"10.1109/ICDCS54860.2022.00038","DOIUrl":"https://doi.org/10.1109/ICDCS54860.2022.00038","url":null,"abstract":"This paper pushes the limits of automated resource allocation in container environments. Recent works set container CPU and memory limits by automatically scaling containers based on past resource usage. However, these systems are heavy- weight and run on coarse-grained time scales, resulting in poor performance when predictions are incorrect. We propose Escra, a container orchestrator that enables fine-grained, event- based resource allocation for a single container and distributed resource allocation to manage a collection of containers. Escra performs resource allocation on sub-second intervals within and across hosts, allowing operators to cost-effectively scale resources without performance penalty. We evaluate Escra on two types of containerized applications: microservices and serverless functions. In microservice environments, fine-grained and event- based resource allocation can reduce application latency by up to 96.9% and increase throughput by up to 3.2x when compared against the current state-of-the-art. Escra can increase performance while simultaneously reducing 50th and 99th%ile CPU waste by over 10x and 3.2x, respectively. In serverless environments, Escra can reduce CPU reservations by over 2.1x and memory reservations by more than 2x while maintaining similar end-to-end performance.","PeriodicalId":225883,"journal":{"name":"2022 IEEE 42nd International Conference on Distributed Computing Systems (ICDCS)","volume":"98 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115988659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-01DOI: 10.1109/ICDCS54860.2022.00125
Marcello Di Giammarco, F. Mercaldo, Fabio Martinelli, A. Santone
Often when we have a lot of data available we can not give them an interpretability and an explainability such as to be able to extract answers, and even more so diagnosis in the medical field. The aim of this contribution is to introduce a way to provide explainability to data and features that could escape even medical doctors, and that with the use of Machine Learning models can be categorized and "explained".
{"title":"Explainable Deep Learning Methodologies for Biomedical Images Classification","authors":"Marcello Di Giammarco, F. Mercaldo, Fabio Martinelli, A. Santone","doi":"10.1109/ICDCS54860.2022.00125","DOIUrl":"https://doi.org/10.1109/ICDCS54860.2022.00125","url":null,"abstract":"Often when we have a lot of data available we can not give them an interpretability and an explainability such as to be able to extract answers, and even more so diagnosis in the medical field. The aim of this contribution is to introduce a way to provide explainability to data and features that could escape even medical doctors, and that with the use of Machine Learning models can be categorized and \"explained\".","PeriodicalId":225883,"journal":{"name":"2022 IEEE 42nd International Conference on Distributed Computing Systems (ICDCS)","volume":"218 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114594774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Federated Learning (FL) suffers from Low-quality model training in mobile edge computing, due to the dynamic environment of mobile clients. To the best of our knowledge, most FL frameworks follow the reactive client scheduling, in which the FL parameter server selects participants according to the currently-observed state of clients. Thus, the participants selected by the reactive-manner methods are very likely to fail while training a round of FL. To this end, we propose a proactive Context-aware Federated Learning (ContextFL) mechanism, which consists of two primary modules. Firstly, the state prediction module enables each client device to predict the conditions of both local training and reporting phases of FL locally. Secondly, the decision-making algorithm module is devised using the contextual Multi-Armed Bandit (cMAB) framework, which can help the parameter server select the most appropriate group of mobile clients. Finally, we carried out trace-driven FL experiments using real-world mobility datasets collected from volunteers. The evaluation results demonstrate that the proposed ContextFL mechanism outperforms other baselines in terms of the convergence stability of the global FL model and the ratio of valid participants.
{"title":"ContextFL: Context-aware Federated Learning by Estimating the Training and Reporting Phases of Mobile Clients","authors":"Huawei Huang, Ruixin Li, Jialiang Liu, Sicong Zhou, Kangying Lin, Zibin Zheng","doi":"10.1109/ICDCS54860.2022.00061","DOIUrl":"https://doi.org/10.1109/ICDCS54860.2022.00061","url":null,"abstract":"Federated Learning (FL) suffers from Low-quality model training in mobile edge computing, due to the dynamic environment of mobile clients. To the best of our knowledge, most FL frameworks follow the reactive client scheduling, in which the FL parameter server selects participants according to the currently-observed state of clients. Thus, the participants selected by the reactive-manner methods are very likely to fail while training a round of FL. To this end, we propose a proactive Context-aware Federated Learning (ContextFL) mechanism, which consists of two primary modules. Firstly, the state prediction module enables each client device to predict the conditions of both local training and reporting phases of FL locally. Secondly, the decision-making algorithm module is devised using the contextual Multi-Armed Bandit (cMAB) framework, which can help the parameter server select the most appropriate group of mobile clients. Finally, we carried out trace-driven FL experiments using real-world mobility datasets collected from volunteers. The evaluation results demonstrate that the proposed ContextFL mechanism outperforms other baselines in terms of the convergence stability of the global FL model and the ratio of valid participants.","PeriodicalId":225883,"journal":{"name":"2022 IEEE 42nd International Conference on Distributed Computing Systems (ICDCS)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126949674","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}