A cascaded network represents a classic scaling-out model in traditional electrical switching networks. Recent proposals have integrated optical circuit switching at specific tiers of these networks to reduce power consumption and enhance topological flexibility. Utilizing a multi-tiered cascaded optical circuit switching network is expected to extend the advantages of optical circuit switching further. The main challenges fall into two categories. First, an architecture with sufficient connectivity is required to support varying workloads. Second, the network reconfiguration is more complex and necessitates a low-complexity scheduling algorithm. In this work, we propose COCSN, a multi-tiered cascaded optical circuit switching network architecture for data center. COCSN employs wavelength-selective switches that integrate multiple wavelengths to enhance network connectivity. We formulate a mathematical model covering lightpath establishment, network reconfiguration, and reconfiguration goals, and propose theorems to optimize the model. Based on the theorems, we introduce an over-subscription-supported wavelength-by-wavelength scheduling algorithm, facilitating agile establishment of lightpaths in COCSN tailored to communication demand. This algorithm effectively addresses scheduling complexities and mitigates the issue of lengthy WSS configuration times. Simulation studies investigate the impact of flow length, WSS reconfiguration time, and communication domain on COCSN, verifying its significantly lower complexity and superior performance over classical cascaded networks.
{"title":"COCSN: A Multi-Tiered Cascaded Optical Circuit Switching Network for Data Center","authors":"Shuo Li;Huaxi Gu;Xiaoshan Yu;Hua Huang;Songyan Wang;Zeshan Chang","doi":"10.1109/TCC.2024.3488275","DOIUrl":"https://doi.org/10.1109/TCC.2024.3488275","url":null,"abstract":"A cascaded network represents a classic scaling-out model in traditional electrical switching networks. Recent proposals have integrated optical circuit switching at specific tiers of these networks to reduce power consumption and enhance topological flexibility. Utilizing a multi-tiered cascaded optical circuit switching network is expected to extend the advantages of optical circuit switching further. The main challenges fall into two categories. First, an architecture with sufficient connectivity is required to support varying workloads. Second, the network reconfiguration is more complex and necessitates a low-complexity scheduling algorithm. In this work, we propose COCSN, a multi-tiered cascaded optical circuit switching network architecture for data center. COCSN employs wavelength-selective switches that integrate multiple wavelengths to enhance network connectivity. We formulate a mathematical model covering lightpath establishment, network reconfiguration, and reconfiguration goals, and propose theorems to optimize the model. Based on the theorems, we introduce an over-subscription-supported wavelength-by-wavelength scheduling algorithm, facilitating agile establishment of lightpaths in COCSN tailored to communication demand. This algorithm effectively addresses scheduling complexities and mitigates the issue of lengthy WSS configuration times. Simulation studies investigate the impact of flow length, WSS reconfiguration time, and communication domain on COCSN, verifying its significantly lower complexity and superior performance over classical cascaded networks.","PeriodicalId":13202,"journal":{"name":"IEEE Transactions on Cloud Computing","volume":"12 4","pages":"1463-1475"},"PeriodicalIF":5.3,"publicationDate":"2024-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142798036","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-17DOI: 10.1109/TCC.2024.3482574
Chih-Kai Huang;Guillaume Pierre
Distributed monitoring is an essential functionality to allow large cluster federations to efficiently schedule applications on a set of available geo-distributed resources. However, periodically reporting the precise status of each available server is both unnecessary to allow accurate scheduling and unscalable when the number of servers grows. This paper proposes Acala, an aggregate monitoring framework for geo-distributed Kubernetes cluster federations which aims to provide the management cluster with aggregated information about the entire cluster instead of individual servers. Based on actual deployment under a controlled environment in the geo-distributed Grid’5000 testbed, our evaluations show that Acala reduces the cross-cluster network traffic by up to 97% and the scrape duration by up to 55% in the single member cluster experiment. Our solution also decreases cross-cluster network traffic by 95% and memory resource consumption by 83% in multiple member cluster scenarios. A comparison of scheduling efficiency with and without data aggregation shows that aggregation has minimal effects on the system’s scheduling function. These results indicate that our approach is superior to the existing solution and is suitable to handle large-scale geo-distributed Kubernetes cluster federation environments.
{"title":"Aggregate Monitoring for Geo-Distributed Kubernetes Cluster Federations","authors":"Chih-Kai Huang;Guillaume Pierre","doi":"10.1109/TCC.2024.3482574","DOIUrl":"https://doi.org/10.1109/TCC.2024.3482574","url":null,"abstract":"Distributed monitoring is an essential functionality to allow large cluster federations to efficiently schedule applications on a set of available geo-distributed resources. However, periodically reporting the precise status of each available server is both unnecessary to allow accurate scheduling and unscalable when the number of servers grows. This paper proposes Acala, an aggregate monitoring framework for geo-distributed Kubernetes cluster federations which aims to provide the management cluster with aggregated information about the entire cluster instead of individual servers. Based on actual deployment under a controlled environment in the geo-distributed Grid’5000 testbed, our evaluations show that Acala reduces the cross-cluster network traffic by up to 97% and the scrape duration by up to 55% in the single member cluster experiment. Our solution also decreases cross-cluster network traffic by 95% and memory resource consumption by 83% in multiple member cluster scenarios. A comparison of scheduling efficiency with and without data aggregation shows that aggregation has minimal effects on the system’s scheduling function. These results indicate that our approach is superior to the existing solution and is suitable to handle large-scale geo-distributed Kubernetes cluster federation environments.","PeriodicalId":13202,"journal":{"name":"IEEE Transactions on Cloud Computing","volume":"12 4","pages":"1449-1462"},"PeriodicalIF":5.3,"publicationDate":"2024-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142797967","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-17DOI: 10.1109/TCC.2024.3482865
Jiyao Liu;Xuanzhang Liu;Xinliang Wei;Hongchang Gao;Yu Wang
Hierarchical federated learning has emerged as a pragmatic approach to addressing scalability, robustness, and privacy concerns within distributed machine learning, particularly in the context of edge computing. This hierarchical method involves grouping clients at the edge, where the constitution of client groups significantly impacts overall learning performance, influenced by both the benefits obtained and costs incurred during group operations (such as group formation and group training). This is especially true for edge and mobile devices, which are more sensitive to computation and communication overheads. The formation of groups is critical for group-based hierarchical federated learning but often neglected by researchers, especially in the realm of edge systems. In this paper, we present a comprehensive exploration of a group-based federated edge learning framework utilizing the hierarchical cloud-edge-client architecture and employing probabilistic group sampling. Our theoretical analysis of its convergence rate, considering the characteristics of client groups, reveals the pivotal role played by group heterogeneity in achieving convergence. Building on this insight, we introduce new methods for group formation and group sampling, aiming to mitigate data heterogeneity within groups and enhance the convergence and overall performance of federated learning. Our proposed methods are validated through extensive experiments, demonstrating their superiority over current algorithms in terms of prediction accuracy and training cost.
{"title":"Group Formation and Sampling in Group-Based Hierarchical Federated Learning","authors":"Jiyao Liu;Xuanzhang Liu;Xinliang Wei;Hongchang Gao;Yu Wang","doi":"10.1109/TCC.2024.3482865","DOIUrl":"https://doi.org/10.1109/TCC.2024.3482865","url":null,"abstract":"Hierarchical federated learning has emerged as a pragmatic approach to addressing scalability, robustness, and privacy concerns within distributed machine learning, particularly in the context of edge computing. This hierarchical method involves grouping clients at the edge, where the constitution of client groups significantly impacts overall learning performance, influenced by both the benefits obtained and costs incurred during group operations (such as group formation and group training). This is especially true for edge and mobile devices, which are more sensitive to computation and communication overheads. The formation of groups is critical for group-based hierarchical federated learning but often neglected by researchers, especially in the realm of edge systems. In this paper, we present a comprehensive exploration of a group-based federated edge learning framework utilizing the hierarchical cloud-edge-client architecture and employing probabilistic group sampling. Our theoretical analysis of its convergence rate, considering the characteristics of client groups, reveals the pivotal role played by group heterogeneity in achieving convergence. Building on this insight, we introduce new methods for group formation and group sampling, aiming to mitigate data heterogeneity within groups and enhance the convergence and overall performance of federated learning. Our proposed methods are validated through extensive experiments, demonstrating their superiority over current algorithms in terms of prediction accuracy and training cost.","PeriodicalId":13202,"journal":{"name":"IEEE Transactions on Cloud Computing","volume":"12 4","pages":"1433-1448"},"PeriodicalIF":5.3,"publicationDate":"2024-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142797986","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-16DOI: 10.1109/TCC.2024.3482178
Pierre Olivier;A K M Fazla Mehrab;Sandeep Errabelly;Stefan Lankes;Mohamed Lamine Karaoui;Robert Lyerly;Sang-Hoon Kim;Antonio Barbalace;Binoy Ravindran
OS-capable embedded systems exhibiting a very low power consumption are available at an extremely low price point. It makes them highly compelling in a datacenter context. We show that sharing long-running, compute-intensive datacenter workloads between a server machine and one or a few connected embedded boards of negligible cost and power consumption can yield significant performance and energy benefits. Our approach, named Heterogeneous EXecution Offloading (HEXO), selectively offloads Virtual Machines (VMs) from server-class machines to embedded boards. Our design tackles several challenges. We address the Instruction Set Architecture (ISA) difference between typical servers (x86) and embedded systems (ARM) through hypervisor and guest OS-level support for heterogeneous-ISA runtime VM migration. We cope with the low amount of resources in embedded systems by using lightweight VMs – unikernels – and by using the server's free RAM as remote memory for embedded boards through a transparent lightweight memory disaggregation mechanism for heterogeneous server-embedded clusters, called Netswap. VMs are offloaded based on an estimation of the slowdown expected from running on a given board. We build a prototype of HEXO and demonstrate significant increases in throughput (up to 67%) and energy efficiency (up to 56%) using benchmarks representative of compute-intensive long-running workloads.
{"title":"HEXO: Offloading Long-Running Compute- and Memory-Intensive Workloads on Low-Cost, Low-Power Embedded Systems","authors":"Pierre Olivier;A K M Fazla Mehrab;Sandeep Errabelly;Stefan Lankes;Mohamed Lamine Karaoui;Robert Lyerly;Sang-Hoon Kim;Antonio Barbalace;Binoy Ravindran","doi":"10.1109/TCC.2024.3482178","DOIUrl":"https://doi.org/10.1109/TCC.2024.3482178","url":null,"abstract":"OS-capable embedded systems exhibiting a very low power consumption are available at an extremely low price point. It makes them highly compelling in a datacenter context. We show that sharing long-running, compute-intensive datacenter workloads between a server machine and one or a few connected embedded boards of negligible cost and power consumption can yield significant performance and energy benefits. Our approach, named Heterogeneous EXecution Offloading (HEXO), selectively offloads Virtual Machines (VMs) from server-class machines to embedded boards. Our design tackles several challenges. We address the Instruction Set Architecture (ISA) difference between typical servers (x86) and embedded systems (ARM) through hypervisor and guest OS-level support for heterogeneous-ISA runtime VM migration. We cope with the low amount of resources in embedded systems by using lightweight VMs – unikernels – and by using the server's free RAM as remote memory for embedded boards through a transparent lightweight memory disaggregation mechanism for heterogeneous server-embedded clusters, called Netswap. VMs are offloaded based on an estimation of the slowdown expected from running on a given board. We build a prototype of HEXO and demonstrate significant increases in throughput (up to 67%) and energy efficiency (up to 56%) using benchmarks representative of compute-intensive long-running workloads.","PeriodicalId":13202,"journal":{"name":"IEEE Transactions on Cloud Computing","volume":"12 4","pages":"1415-1432"},"PeriodicalIF":5.3,"publicationDate":"2024-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142797966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-15DOI: 10.1109/TCC.2024.3481039
Zihan Gao;Peixiao Zheng;Wanming Hao;Shouyi Yang
Collaborative cloud computing (CCC) has emerged as a promising paradigm to support computation-intensive and delay-sensitive applications by leveraging MEC and MCC technologies. However, the coupling between multiple variables and subtask dependencies within an application poses significant challenges to the computation offloading mechanism. To address this, we investigate the computation offloading problem for CCC by jointly optimizing offloading decisions, resource allocation, and subtask scheduling across a multi-core edge server. First, we exploit latency to design a subtask dependency model within the application. Next, we formulate a System Energy-Time Cost ( $SETC$