首页 > 最新文献

2022 IEEE 42nd International Conference on Distributed Computing Systems (ICDCS)最新文献

英文 中文
Joint Caching and Routing in Cache Networks with Arbitrary Topology 任意拓扑缓存网络中的联合缓存与路由
Pub Date : 2022-07-01 DOI: 10.1109/ICDCS54860.2022.00015
Tian Xie, Sanchal Thakkar, Ting He, P. Mcdaniel, Quinn K. Burke
In-network caching and flexible routing are two of the most celebrated advantages of next generation network infrastructures. Yet few solutions are available for jointly optimizing caching and routing that provide performance guarantees for an arbitrary topology. We take a holistic approach towards this fundamental problem by analyzing its complexity in all the cases and developing polynomial-time algorithms with approximation guarantees in important special cases. We also reveal the fundamental challenge in achieving guaranteed approximation in the general case and propose an alternating optimization algorithm with good performance and fast convergence. Our algorithms have demonstrated superior performance in both routing cost and congestion compared to the state-of-the-art solutions in evaluations based on real topology and request traces.
网络内缓存和灵活路由是下一代网络基础设施的两个最著名的优点。然而,很少有解决方案可用于联合优化缓存和路由,为任意拓扑提供性能保证。我们通过分析其在所有情况下的复杂性,并在重要的特殊情况下开发具有近似保证的多项式时间算法,采取整体方法来解决这个基本问题。我们还揭示了在一般情况下实现保证逼近的基本挑战,并提出了一种性能良好、收敛速度快的交替优化算法。与基于真实拓扑和请求跟踪的最先进的评估方案相比,我们的算法在路由成本和拥塞方面都表现出了卓越的性能。
{"title":"Joint Caching and Routing in Cache Networks with Arbitrary Topology","authors":"Tian Xie, Sanchal Thakkar, Ting He, P. Mcdaniel, Quinn K. Burke","doi":"10.1109/ICDCS54860.2022.00015","DOIUrl":"https://doi.org/10.1109/ICDCS54860.2022.00015","url":null,"abstract":"In-network caching and flexible routing are two of the most celebrated advantages of next generation network infrastructures. Yet few solutions are available for jointly optimizing caching and routing that provide performance guarantees for an arbitrary topology. We take a holistic approach towards this fundamental problem by analyzing its complexity in all the cases and developing polynomial-time algorithms with approximation guarantees in important special cases. We also reveal the fundamental challenge in achieving guaranteed approximation in the general case and propose an alternating optimization algorithm with good performance and fast convergence. Our algorithms have demonstrated superior performance in both routing cost and congestion compared to the state-of-the-art solutions in evaluations based on real topology and request traces.","PeriodicalId":225883,"journal":{"name":"2022 IEEE 42nd International Conference on Distributed Computing Systems (ICDCS)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128721142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Organizing committee 组织委员会
Pub Date : 2022-07-01 DOI: 10.1109/cgo.2013.6494974
S. Nirenburg, T. Oates
Provides a listing of current committee members.
提供当前委员会成员的列表。
{"title":"Organizing committee","authors":"S. Nirenburg, T. Oates","doi":"10.1109/cgo.2013.6494974","DOIUrl":"https://doi.org/10.1109/cgo.2013.6494974","url":null,"abstract":"Provides a listing of current committee members.","PeriodicalId":225883,"journal":{"name":"2022 IEEE 42nd International Conference on Distributed Computing Systems (ICDCS)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126401811","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Neurotrie: Deep Reinforcement Learning-based Fast Software IPv6 Lookup Neurotrie:基于深度强化学习的快速软件IPv6查找
Pub Date : 2022-07-01 DOI: 10.1109/ICDCS54860.2022.00093
Hao Chen, Yuan Yang, Mingwei Xu, Yuxuan Zhang, Chenyi Liu
IPv6 has shown notable growth in recent years, imposing the need for high-speed IPv6 lookup. As the forwarding rate of virtual switches continues increasing, software-based IPv6 lookup without using special hardware such as TCAM, GPU, and FPGA is of academic interest and industrial importance. Existing studies achieve fast software IPv4 lookup by reducing the operation number, as well as reducing the memory footprint so as to benefit from CPU cache. However, in the situation of 128-bit IPv6 addresses, it is challenging to keep both operation numbers and memory footprints small. To address the issue, we propose the Neurotrie data structure, which supports fast lookup and arbitrary strides. Thus, a good balance can be made between trie depth and memory footprint by computing the proper stride for each Neurotrie node. We model the optimal Neurotrie problem which minimizes the depth with limited memory footprint and develop a pseudo-polynomial time baseline algorithm to construct Neurotrie using dynamic programming. To improve the performance and reduce the computation complexity, we develop a deep reinforcement learning-based approach, which leverages a deep neural network to construct Neurotrie efficiently, based on characteristics captured from real IPv6 prefixes. We further refine the data structure and develop an efficient mechanism for routing updates. Experiments on real routing tables show that Neurotrie achieves a lookup rate 34% higher than that of state-of-the-art approaches.
近年来,IPv6显示出显著的增长,这就需要高速IPv6查找。随着虚拟交换机转发速率的不断提高,不使用特殊硬件(如TCAM、GPU和FPGA)的基于软件的IPv6查找具有重要的学术意义和工业意义。现有的研究通过减少操作次数,以及减少内存占用来实现快速软件IPv4查找,从而受益于CPU缓存。然而,在128位IPv6地址的情况下,保持操作数量和内存占用较小是具有挑战性的。为了解决这个问题,我们提出了Neurotrie数据结构,它支持快速查找和任意步进。因此,通过计算每个Neurotrie节点的适当步幅,可以在树深度和内存占用之间取得良好的平衡。我们建立了在有限内存占用下最小化深度的最优Neurotrie问题模型,并开发了一种伪多项式时间基线算法来使用动态规划构造Neurotrie。为了提高性能并降低计算复杂度,我们开发了一种基于深度强化学习的方法,该方法利用深度神经网络基于从真实IPv6前缀中捕获的特征有效地构建Neurotrie。我们进一步完善了数据结构,并开发了一种有效的路由更新机制。在真实路由表上的实验表明,Neurotrie的查找率比最先进的方法高34%。
{"title":"Neurotrie: Deep Reinforcement Learning-based Fast Software IPv6 Lookup","authors":"Hao Chen, Yuan Yang, Mingwei Xu, Yuxuan Zhang, Chenyi Liu","doi":"10.1109/ICDCS54860.2022.00093","DOIUrl":"https://doi.org/10.1109/ICDCS54860.2022.00093","url":null,"abstract":"IPv6 has shown notable growth in recent years, imposing the need for high-speed IPv6 lookup. As the forwarding rate of virtual switches continues increasing, software-based IPv6 lookup without using special hardware such as TCAM, GPU, and FPGA is of academic interest and industrial importance. Existing studies achieve fast software IPv4 lookup by reducing the operation number, as well as reducing the memory footprint so as to benefit from CPU cache. However, in the situation of 128-bit IPv6 addresses, it is challenging to keep both operation numbers and memory footprints small. To address the issue, we propose the Neurotrie data structure, which supports fast lookup and arbitrary strides. Thus, a good balance can be made between trie depth and memory footprint by computing the proper stride for each Neurotrie node. We model the optimal Neurotrie problem which minimizes the depth with limited memory footprint and develop a pseudo-polynomial time baseline algorithm to construct Neurotrie using dynamic programming. To improve the performance and reduce the computation complexity, we develop a deep reinforcement learning-based approach, which leverages a deep neural network to construct Neurotrie efficiently, based on characteristics captured from real IPv6 prefixes. We further refine the data structure and develop an efficient mechanism for routing updates. Experiments on real routing tables show that Neurotrie achieves a lookup rate 34% higher than that of state-of-the-art approaches.","PeriodicalId":225883,"journal":{"name":"2022 IEEE 42nd International Conference on Distributed Computing Systems (ICDCS)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131458104","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Designing Robust Deep Learning Classifiers for Image-based Malware Analysis 基于图像的恶意软件分析的鲁棒深度学习分类器设计
Pub Date : 2022-07-01 DOI: 10.1109/ICDCS54860.2022.00126
Giacomo Iadarola, F. Mercaldo, Fabio Martinelli, A. Santone
Deep Learning models demonstrated high accuracies performance in malware classification, but they are still lacking "explainability" to ensure robustness and reliability in the generated prediction. In this short contribution, we summarize the researches that we conducted in the latest years in the Malware Analysis field.
深度学习模型在恶意软件分类中表现出较高的准确率,但在生成的预测中仍然缺乏“可解释性”来保证鲁棒性和可靠性。在这篇简短的文章中,我们总结了近年来在恶意软件分析领域所进行的研究。
{"title":"Designing Robust Deep Learning Classifiers for Image-based Malware Analysis","authors":"Giacomo Iadarola, F. Mercaldo, Fabio Martinelli, A. Santone","doi":"10.1109/ICDCS54860.2022.00126","DOIUrl":"https://doi.org/10.1109/ICDCS54860.2022.00126","url":null,"abstract":"Deep Learning models demonstrated high accuracies performance in malware classification, but they are still lacking \"explainability\" to ensure robustness and reliability in the generated prediction. In this short contribution, we summarize the researches that we conducted in the latest years in the Malware Analysis field.","PeriodicalId":225883,"journal":{"name":"2022 IEEE 42nd International Conference on Distributed Computing Systems (ICDCS)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132105430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
AIACC-Training: Optimizing Distributed Deep Learning Training through Multi-streamed and Concurrent Gradient Communications AIACC-Training:通过多流和并发梯度通信优化分布式深度学习训练
Pub Date : 2022-07-01 DOI: 10.1109/ICDCS54860.2022.00087
Lixiang Lin, Shenghao Qiu, Ziqi Yu, Liang You, Long Xin, Xiaoyang Sun, J. Xu, Zheng Wang
There is a growing interest in training deep neural networks (DNNs) in a GPU cloud environment. This is typically achieved by running parallel training workers on multiple GPUs across computing nodes. Under such a setup, the communication overhead is often responsible for long training time and poor scalability. This paper presents AIACC-Training, a unified communication framework designed for the distributed training of DNNs in a GPU cloud environment. AIACC-Training permits a training worker to participate in multiple gradient communication operations simultaneously to improve network bandwidth utilization and reduce communication latency. It employs auto-tuning techniques to dynamically determine the right communication parameters based on the input DNN workloads and the underlying network infrastructure. AIACC-Training has been deployed to production at Alibaba GPU Cloud with 3000+ GPUs executing AIACC-Training optimized code at any time. Experiments performed on representative DNN workloads show that AIACC-Training outperforms existing solutions, improving the training throughput and scalability by a large margin.
人们对在GPU云环境中训练深度神经网络(dnn)越来越感兴趣。这通常是通过在跨计算节点的多个gpu上运行并行训练工作者来实现的。在这种设置下,通信开销通常导致训练时间长,可扩展性差。本文提出了一种用于GPU云环境下dnn分布式训练的统一通信框架AIACC-Training。AIACC-Training允许培训人员同时参与多个梯度通信操作,以提高网络带宽利用率,减少通信延迟。它采用自动调优技术,根据输入DNN工作负载和底层网络基础设施动态确定正确的通信参数。AIACC-Training已部署到阿里GPU云生产,3000+ GPU随时执行AIACC-Training优化代码。在代表性DNN工作负载上进行的实验表明,AIACC-Training优于现有的解决方案,大大提高了训练吞吐量和可扩展性。
{"title":"AIACC-Training: Optimizing Distributed Deep Learning Training through Multi-streamed and Concurrent Gradient Communications","authors":"Lixiang Lin, Shenghao Qiu, Ziqi Yu, Liang You, Long Xin, Xiaoyang Sun, J. Xu, Zheng Wang","doi":"10.1109/ICDCS54860.2022.00087","DOIUrl":"https://doi.org/10.1109/ICDCS54860.2022.00087","url":null,"abstract":"There is a growing interest in training deep neural networks (DNNs) in a GPU cloud environment. This is typically achieved by running parallel training workers on multiple GPUs across computing nodes. Under such a setup, the communication overhead is often responsible for long training time and poor scalability. This paper presents AIACC-Training, a unified communication framework designed for the distributed training of DNNs in a GPU cloud environment. AIACC-Training permits a training worker to participate in multiple gradient communication operations simultaneously to improve network bandwidth utilization and reduce communication latency. It employs auto-tuning techniques to dynamically determine the right communication parameters based on the input DNN workloads and the underlying network infrastructure. AIACC-Training has been deployed to production at Alibaba GPU Cloud with 3000+ GPUs executing AIACC-Training optimized code at any time. Experiments performed on representative DNN workloads show that AIACC-Training outperforms existing solutions, improving the training throughput and scalability by a large margin.","PeriodicalId":225883,"journal":{"name":"2022 IEEE 42nd International Conference on Distributed Computing Systems (ICDCS)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132928287","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Mobility-aware Seamless Virtual Function Migration in Deviceless Edge Computing Environments 无设备边缘计算环境中移动感知的无缝虚拟功能迁移
Pub Date : 2022-07-01 DOI: 10.1109/ICDCS54860.2022.00050
Yaodong Huang, Zelin Lin, Tingting Yao, Xiaojun Shang, Laizhong Cui, J. Huang
Serverless Computing and Function-as-a-Service (FaaS) offer convenient and transparent services to developers and users. The deployment and resource allocation of services are managed by the cloud service providers. Meanwhile, the development of smart mobile devices and network technology enables the collection and transmission of a huge amount of data, which creates the mobile edge computing shifting tasks to the network edge for mobile users. In this paper, we propose a deviceless edge computing system targeting the mobility of end users. We focus on the migration of virtual functions to provide uninterrupted services to mobile users. We introduce the deviceless edge computing model and propose a seamless migration scheme of virtual functions with limited involvement of function developers. We formulate the migration decision problem into integer linear programming and use receding horizon control (RHC) for online solutions. We implement the migration system and algorithm to support delay-sensitive scenarios over real edge devices and develop a streaming game as the virtual function to test the performance. Extensive experiments in real scenarios exhibit the system has the ability to support high-mobility and delay-sensitive application scenarios. Extensive simulation results also show its applicability over large-scale networks.
无服务器计算和功能即服务(FaaS)为开发人员和用户提供方便和透明的服务。服务的部署和资源分配由云服务提供商管理。同时,智能移动设备和网络技术的发展使得海量数据的采集和传输成为可能,这使得移动用户的移动边缘计算任务向网络边缘转移。本文提出了一种针对终端用户移动性的无设备边缘计算系统。我们专注于虚拟功能的迁移,为移动用户提供不间断的服务。我们引入了无设备边缘计算模型,并提出了一种功能开发人员参与有限的虚拟功能无缝迁移方案。我们将迁移决策问题转化为整数线性规划,并使用后退水平控制(RHC)进行在线求解。我们实现了迁移系统和算法,以支持在真实边缘设备上的延迟敏感场景,并开发了一个流游戏作为虚拟功能来测试性能。在实际场景中的大量实验表明,该系统具有支持高移动性和延迟敏感应用场景的能力。大量的仿真结果也表明了该方法在大规模网络中的适用性。
{"title":"Mobility-aware Seamless Virtual Function Migration in Deviceless Edge Computing Environments","authors":"Yaodong Huang, Zelin Lin, Tingting Yao, Xiaojun Shang, Laizhong Cui, J. Huang","doi":"10.1109/ICDCS54860.2022.00050","DOIUrl":"https://doi.org/10.1109/ICDCS54860.2022.00050","url":null,"abstract":"Serverless Computing and Function-as-a-Service (FaaS) offer convenient and transparent services to developers and users. The deployment and resource allocation of services are managed by the cloud service providers. Meanwhile, the development of smart mobile devices and network technology enables the collection and transmission of a huge amount of data, which creates the mobile edge computing shifting tasks to the network edge for mobile users. In this paper, we propose a deviceless edge computing system targeting the mobility of end users. We focus on the migration of virtual functions to provide uninterrupted services to mobile users. We introduce the deviceless edge computing model and propose a seamless migration scheme of virtual functions with limited involvement of function developers. We formulate the migration decision problem into integer linear programming and use receding horizon control (RHC) for online solutions. We implement the migration system and algorithm to support delay-sensitive scenarios over real edge devices and develop a streaming game as the virtual function to test the performance. Extensive experiments in real scenarios exhibit the system has the ability to support high-mobility and delay-sensitive application scenarios. Extensive simulation results also show its applicability over large-scale networks.","PeriodicalId":225883,"journal":{"name":"2022 IEEE 42nd International Conference on Distributed Computing Systems (ICDCS)","volume":"90 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133958780","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Stabilizer: Geo-Replication with User-defined Consistency 稳定器:具有用户定义一致性的Geo-Replication
Pub Date : 2022-07-01 DOI: 10.1109/ICDCS54860.2022.00042
Pengze Li, Lichen Pan, Xinzhe Yang, Weijia Song, Zhen Xiao, K. Birman
Geo-replication is essential in reliable large-scale cloud applications. We argue that existing replication solutions are too rigid to support today’s diversity of data consistency and performance requirements. Stabilizer is a flexible geo-replication library, supporting user-defined consistency models. The library achieves high performance using control-plane / data-plane separation: control events do not disrupt data flow. Our API offers simple control-plane operators that allow an application to define its desired consistency model: a stability frontier predicate. We build a wide-area K/V store with Stabilizer, a Dropbox-like application, and a prototype pub/sub system to show its versatility and evaluate its performance. When compared with a Paxos-based consistency protocol in an emulated Amazon EC2 wide-area network, experiments show that for a scenario requiring a more accurate consistency model, Stabilizer achieves a 24.75% latency performance improvement. Compared to Apache Pulsar in a real WAN environment, Stabilizer’s dynamic reconfiguration mechanism improves the pub/sub system performance significantly according to our experiment results.
地理复制在可靠的大规模云应用程序中是必不可少的。我们认为,现有的复制解决方案过于僵化,无法支持当今多样化的数据一致性和性能需求。Stabilizer是一个灵活的地理复制库,支持用户定义的一致性模型。该库使用控制平面/数据平面分离实现高性能:控制事件不会中断数据流。我们的API提供了简单的控制平面操作符,允许应用程序定义其所需的一致性模型:稳定性边界谓词。我们使用Stabilizer(一个类似dropbox的应用程序)和一个pub/sub系统原型构建了一个广域K/V商店,以展示其多功能性并评估其性能。在模拟的Amazon EC2广域网中,与基于paxos的一致性协议进行了比较,实验表明,对于需要更精确的一致性模型的场景,Stabilizer实现了24.75%的延迟性能改进。实验结果表明,与实际广域网环境下的Apache Pulsar相比,Stabilizer的动态重构机制显著提高了pub/sub系统的性能。
{"title":"Stabilizer: Geo-Replication with User-defined Consistency","authors":"Pengze Li, Lichen Pan, Xinzhe Yang, Weijia Song, Zhen Xiao, K. Birman","doi":"10.1109/ICDCS54860.2022.00042","DOIUrl":"https://doi.org/10.1109/ICDCS54860.2022.00042","url":null,"abstract":"Geo-replication is essential in reliable large-scale cloud applications. We argue that existing replication solutions are too rigid to support today’s diversity of data consistency and performance requirements. Stabilizer is a flexible geo-replication library, supporting user-defined consistency models. The library achieves high performance using control-plane / data-plane separation: control events do not disrupt data flow. Our API offers simple control-plane operators that allow an application to define its desired consistency model: a stability frontier predicate. We build a wide-area K/V store with Stabilizer, a Dropbox-like application, and a prototype pub/sub system to show its versatility and evaluate its performance. When compared with a Paxos-based consistency protocol in an emulated Amazon EC2 wide-area network, experiments show that for a scenario requiring a more accurate consistency model, Stabilizer achieves a 24.75% latency performance improvement. Compared to Apache Pulsar in a real WAN environment, Stabilizer’s dynamic reconfiguration mechanism improves the pub/sub system performance significantly according to our experiment results.","PeriodicalId":225883,"journal":{"name":"2022 IEEE 42nd International Conference on Distributed Computing Systems (ICDCS)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128429919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Escra: Event-driven, Sub-second Container Resource Allocation Escra:事件驱动的亚秒容器资源分配
Pub Date : 2022-07-01 DOI: 10.1109/ICDCS54860.2022.00038
Greg Cusack, Maziyar Nazari, Sepideh Goodarzy, Erika Hunhoff, Prerit Oberai, Eric Keller, Eric Rozner, Richard Han
This paper pushes the limits of automated resource allocation in container environments. Recent works set container CPU and memory limits by automatically scaling containers based on past resource usage. However, these systems are heavy- weight and run on coarse-grained time scales, resulting in poor performance when predictions are incorrect. We propose Escra, a container orchestrator that enables fine-grained, event- based resource allocation for a single container and distributed resource allocation to manage a collection of containers. Escra performs resource allocation on sub-second intervals within and across hosts, allowing operators to cost-effectively scale resources without performance penalty. We evaluate Escra on two types of containerized applications: microservices and serverless functions. In microservice environments, fine-grained and event- based resource allocation can reduce application latency by up to 96.9% and increase throughput by up to 3.2x when compared against the current state-of-the-art. Escra can increase performance while simultaneously reducing 50th and 99th%ile CPU waste by over 10x and 3.2x, respectively. In serverless environments, Escra can reduce CPU reservations by over 2.1x and memory reservations by more than 2x while maintaining similar end-to-end performance.
本文探讨了容器环境中自动化资源分配的极限。最近的工作通过根据过去的资源使用情况自动缩放容器来设置容器CPU和内存限制。然而,这些系统是重量级的,并且运行在粗粒度的时间尺度上,当预测不正确时,会导致性能差。我们提出了Escra,这是一个容器编排器,它支持对单个容器进行细粒度的、基于事件的资源分配,并支持对容器集合进行分布式资源分配。Escra在主机内部和主机之间以亚秒的间隔执行资源分配,允许运营商在不影响性能的情况下经济有效地扩展资源。我们在两种类型的容器化应用程序上评估Escra:微服务和无服务器功能。在微服务环境中,与当前技术相比,细粒度和基于事件的资源分配可以将应用程序延迟减少96.9%,并将吞吐量提高3.2倍。Escra可以提高性能,同时减少50%和99%的CPU浪费,分别超过10倍和3.2倍。在无服务器环境中,Escra可以将CPU预留减少2.1倍以上,内存预留减少2倍以上,同时保持类似的端到端性能。
{"title":"Escra: Event-driven, Sub-second Container Resource Allocation","authors":"Greg Cusack, Maziyar Nazari, Sepideh Goodarzy, Erika Hunhoff, Prerit Oberai, Eric Keller, Eric Rozner, Richard Han","doi":"10.1109/ICDCS54860.2022.00038","DOIUrl":"https://doi.org/10.1109/ICDCS54860.2022.00038","url":null,"abstract":"This paper pushes the limits of automated resource allocation in container environments. Recent works set container CPU and memory limits by automatically scaling containers based on past resource usage. However, these systems are heavy- weight and run on coarse-grained time scales, resulting in poor performance when predictions are incorrect. We propose Escra, a container orchestrator that enables fine-grained, event- based resource allocation for a single container and distributed resource allocation to manage a collection of containers. Escra performs resource allocation on sub-second intervals within and across hosts, allowing operators to cost-effectively scale resources without performance penalty. We evaluate Escra on two types of containerized applications: microservices and serverless functions. In microservice environments, fine-grained and event- based resource allocation can reduce application latency by up to 96.9% and increase throughput by up to 3.2x when compared against the current state-of-the-art. Escra can increase performance while simultaneously reducing 50th and 99th%ile CPU waste by over 10x and 3.2x, respectively. In serverless environments, Escra can reduce CPU reservations by over 2.1x and memory reservations by more than 2x while maintaining similar end-to-end performance.","PeriodicalId":225883,"journal":{"name":"2022 IEEE 42nd International Conference on Distributed Computing Systems (ICDCS)","volume":"98 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115988659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Explainable Deep Learning Methodologies for Biomedical Images Classification 生物医学图像分类的可解释深度学习方法
Pub Date : 2022-07-01 DOI: 10.1109/ICDCS54860.2022.00125
Marcello Di Giammarco, F. Mercaldo, Fabio Martinelli, A. Santone
Often when we have a lot of data available we can not give them an interpretability and an explainability such as to be able to extract answers, and even more so diagnosis in the medical field. The aim of this contribution is to introduce a way to provide explainability to data and features that could escape even medical doctors, and that with the use of Machine Learning models can be categorized and "explained".
通常,当我们有很多可用的数据时,我们不能给它们一个可解释性和可解释性,例如能够提取答案,甚至在医学领域的诊断。这一贡献的目的是引入一种方法,为甚至连医生都无法解释的数据和特征提供可解释性,并且通过使用机器学习模型可以对其进行分类和“解释”。
{"title":"Explainable Deep Learning Methodologies for Biomedical Images Classification","authors":"Marcello Di Giammarco, F. Mercaldo, Fabio Martinelli, A. Santone","doi":"10.1109/ICDCS54860.2022.00125","DOIUrl":"https://doi.org/10.1109/ICDCS54860.2022.00125","url":null,"abstract":"Often when we have a lot of data available we can not give them an interpretability and an explainability such as to be able to extract answers, and even more so diagnosis in the medical field. The aim of this contribution is to introduce a way to provide explainability to data and features that could escape even medical doctors, and that with the use of Machine Learning models can be categorized and \"explained\".","PeriodicalId":225883,"journal":{"name":"2022 IEEE 42nd International Conference on Distributed Computing Systems (ICDCS)","volume":"218 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114594774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
ContextFL: Context-aware Federated Learning by Estimating the Training and Reporting Phases of Mobile Clients ContextFL:通过估计移动客户端的训练和报告阶段来实现上下文感知的联邦学习
Pub Date : 2022-07-01 DOI: 10.1109/ICDCS54860.2022.00061
Huawei Huang, Ruixin Li, Jialiang Liu, Sicong Zhou, Kangying Lin, Zibin Zheng
Federated Learning (FL) suffers from Low-quality model training in mobile edge computing, due to the dynamic environment of mobile clients. To the best of our knowledge, most FL frameworks follow the reactive client scheduling, in which the FL parameter server selects participants according to the currently-observed state of clients. Thus, the participants selected by the reactive-manner methods are very likely to fail while training a round of FL. To this end, we propose a proactive Context-aware Federated Learning (ContextFL) mechanism, which consists of two primary modules. Firstly, the state prediction module enables each client device to predict the conditions of both local training and reporting phases of FL locally. Secondly, the decision-making algorithm module is devised using the contextual Multi-Armed Bandit (cMAB) framework, which can help the parameter server select the most appropriate group of mobile clients. Finally, we carried out trace-driven FL experiments using real-world mobility datasets collected from volunteers. The evaluation results demonstrate that the proposed ContextFL mechanism outperforms other baselines in terms of the convergence stability of the global FL model and the ratio of valid participants.
由于移动客户端的动态环境,联邦学习(FL)在移动边缘计算中存在低质量的模型训练问题。据我们所知,大多数FL框架都遵循响应式客户端调度,其中FL参数服务器根据客户端当前观察到的状态选择参与者。因此,由反应方式方法选择的参与者在训练一轮FL时很可能失败。为此,我们提出了一种主动的上下文感知联邦学习(ContextFL)机制,该机制由两个主要模块组成。首先,状态预测模块使每个客户端设备能够本地预测FL的本地训练和报告阶段的条件。其次,采用上下文多武装班迪(cMAB)框架设计决策算法模块,帮助参数服务器选择最合适的移动客户端组;最后,我们使用从志愿者收集的真实世界移动数据集进行了跟踪驱动的FL实验。评价结果表明,本文提出的ContextFL机制在全局FL模型的收敛稳定性和有效参与者的比例方面优于其他基线。
{"title":"ContextFL: Context-aware Federated Learning by Estimating the Training and Reporting Phases of Mobile Clients","authors":"Huawei Huang, Ruixin Li, Jialiang Liu, Sicong Zhou, Kangying Lin, Zibin Zheng","doi":"10.1109/ICDCS54860.2022.00061","DOIUrl":"https://doi.org/10.1109/ICDCS54860.2022.00061","url":null,"abstract":"Federated Learning (FL) suffers from Low-quality model training in mobile edge computing, due to the dynamic environment of mobile clients. To the best of our knowledge, most FL frameworks follow the reactive client scheduling, in which the FL parameter server selects participants according to the currently-observed state of clients. Thus, the participants selected by the reactive-manner methods are very likely to fail while training a round of FL. To this end, we propose a proactive Context-aware Federated Learning (ContextFL) mechanism, which consists of two primary modules. Firstly, the state prediction module enables each client device to predict the conditions of both local training and reporting phases of FL locally. Secondly, the decision-making algorithm module is devised using the contextual Multi-Armed Bandit (cMAB) framework, which can help the parameter server select the most appropriate group of mobile clients. Finally, we carried out trace-driven FL experiments using real-world mobility datasets collected from volunteers. The evaluation results demonstrate that the proposed ContextFL mechanism outperforms other baselines in terms of the convergence stability of the global FL model and the ratio of valid participants.","PeriodicalId":225883,"journal":{"name":"2022 IEEE 42nd International Conference on Distributed Computing Systems (ICDCS)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126949674","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
期刊
2022 IEEE 42nd International Conference on Distributed Computing Systems (ICDCS)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1