首页 > 最新文献

2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)最新文献

英文 中文
Extreme-Scale Realistic Stencil Computations on Sunway TaihuLight with Ten Million Cores 千万核神威太湖之光的极端尺度逼真模板计算
Ying Cai, Chao Yang, Wenjing Ma, Yulong Ao
Stencil computation arises from a large variety of scientific and engineering applications and often plays a critical role in the performance of extreme-scale simulations. Due to the memory bound nature, it is a challenging task to optimize stencil computation kernels on many leadership supercomputers, such as Sunway TaihuLight, which has relatively high computing throughput whilst relatively low data-moving capability. In this white paper, we show the efforts we have been making during the past two years in developing end-to-end implementation and optimization techniques for extreme-scale stencil computations on Sunway TaihuLight. We started with a work on optimizing the 3-D 2nd-order 13-point stencil for nonhydrostatic atmospheric dynamics simulation, which is an important part of the 2016 ACM Gordon Bell Prize winning work, and extended it in ways that can handle a broader range of realistic and challenging problems, such as the HPGMG benchmark that consists of memory-hungry stencils and the gaseous wave detonation simulation that relies on complex high-order stencils. The presented stencil computation paradigm on Sunway TaihuLight includes not only multilevel parallelization to exploit the parallelism on different hardware levels, but also systematic performance optimization techniques for communication, memory access, and computation. We show by extreme-scale tests that the proposed systematic stencil computation paradigm can successfully deliver remarkable performance on Sunway TaihuLight with ten million heterogeneous cores. In particular, we achieve an aggregate performance of 23.12 Pflops for the 3-D 5th order WENO stencil computation in gaseous wave detonation simulation, which is the highest performance result for high-order stencil computations as far as we know, and an aggregate performance of solving over one trillion unknowns per second in the HPGMG benchmark, which ranks the first place in the HPGMG List of Nov 2017.
模板计算出现在各种各样的科学和工程应用中,通常在极端尺度模拟的性能中起着关键作用。由于内存的有限性,在许多领先的超级计算机(如神威太湖之光)上优化模板计算内核是一项具有挑战性的任务,这些超级计算机的计算吞吐量相对较高,而数据移动能力相对较低。在本白皮书中,我们展示了我们在过去两年中为神威太湖之光的极端规模模板计算开发端到端实现和优化技术所做的努力。我们从优化用于非流体静力大气动力学模拟的3-D二阶13点模板开始,这是2016年ACM戈登贝尔奖获奖作品的重要组成部分,并将其扩展到可以处理更广泛的现实和具有挑战性的问题,例如由内存消耗模板组成的HPGMG基准测试和依赖于复杂高阶模板的气体波爆炸模拟。提出的“神威太湖之光”模板计算范式不仅包括利用不同硬件级别并行性的多级并行化,而且还包括通信、内存访问和计算方面的系统性能优化技术。我们通过极端规模的测试表明,所提出的系统模板计算范式可以在具有1000万个异构核的神威太湖之光上成功地提供显着的性能。特别是,我们在气体波爆震模拟中实现了三维五阶WENO模板计算的总性能为23.12 Pflops,这是迄今为止我们所知道的高阶模板计算的最高性能结果,并且在HPGMG基准测试中实现了每秒求解超过一万亿未知数的总性能,在2017年11月HPGMG列表中排名第一。
{"title":"Extreme-Scale Realistic Stencil Computations on Sunway TaihuLight with Ten Million Cores","authors":"Ying Cai, Chao Yang, Wenjing Ma, Yulong Ao","doi":"10.1109/CCGRID.2018.00086","DOIUrl":"https://doi.org/10.1109/CCGRID.2018.00086","url":null,"abstract":"Stencil computation arises from a large variety of scientific and engineering applications and often plays a critical role in the performance of extreme-scale simulations. Due to the memory bound nature, it is a challenging task to optimize stencil computation kernels on many leadership supercomputers, such as Sunway TaihuLight, which has relatively high computing throughput whilst relatively low data-moving capability. In this white paper, we show the efforts we have been making during the past two years in developing end-to-end implementation and optimization techniques for extreme-scale stencil computations on Sunway TaihuLight. We started with a work on optimizing the 3-D 2nd-order 13-point stencil for nonhydrostatic atmospheric dynamics simulation, which is an important part of the 2016 ACM Gordon Bell Prize winning work, and extended it in ways that can handle a broader range of realistic and challenging problems, such as the HPGMG benchmark that consists of memory-hungry stencils and the gaseous wave detonation simulation that relies on complex high-order stencils. The presented stencil computation paradigm on Sunway TaihuLight includes not only multilevel parallelization to exploit the parallelism on different hardware levels, but also systematic performance optimization techniques for communication, memory access, and computation. We show by extreme-scale tests that the proposed systematic stencil computation paradigm can successfully deliver remarkable performance on Sunway TaihuLight with ten million heterogeneous cores. In particular, we achieve an aggregate performance of 23.12 Pflops for the 3-D 5th order WENO stencil computation in gaseous wave detonation simulation, which is the highest performance result for high-order stencil computations as far as we know, and an aggregate performance of solving over one trillion unknowns per second in the HPGMG benchmark, which ranks the first place in the HPGMG List of Nov 2017.","PeriodicalId":321027,"journal":{"name":"2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"3 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114012809","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Building Blocks for Workflow System Middleware 工作流系统中间件的构建块
M. Turilli, André Merzky, Vivek Balasubramanian, S. Jha
We suggest there is a need for a fresh perspective on the design and development of middleware for high-performance workflows and workflow systems. We argue for a building blocks approach, outline a description of this approach and define their properties. We discuss RADICAL-Cybertools as one implementation of the building blocks concept, showing how they have been designed and developed in accordance with this approach. We discuss three case-studies where RADICAL-Cybertools have been used to develop new workflow systems capabilities and in-tegrated to enhance existing ones, illustrating the potential and promise of the building blocks approach.
我们建议对高性能工作流和工作流系统的中间件的设计和开发有一个新的视角。我们支持构建块方法,概述了这种方法的描述并定义了它们的属性。我们讨论RADICAL-Cybertools作为构建模块概念的一种实现,展示了它们是如何根据这种方法设计和开发的。我们讨论了三个案例研究,其中RADICAL-Cybertools被用于开发新的工作流系统功能,并被集成以增强现有的工作流系统功能,说明了构建块方法的潜力和前景。
{"title":"Building Blocks for Workflow System Middleware","authors":"M. Turilli, André Merzky, Vivek Balasubramanian, S. Jha","doi":"10.1109/CCGRID.2018.00051","DOIUrl":"https://doi.org/10.1109/CCGRID.2018.00051","url":null,"abstract":"We suggest there is a need for a fresh perspective on the design and development of middleware for high-performance workflows and workflow systems. We argue for a building blocks approach, outline a description of this approach and define their properties. We discuss RADICAL-Cybertools as one implementation of the building blocks concept, showing how they have been designed and developed in accordance with this approach. We discuss three case-studies where RADICAL-Cybertools have been used to develop new workflow systems capabilities and in-tegrated to enhance existing ones, illustrating the potential and promise of the building blocks approach.","PeriodicalId":321027,"journal":{"name":"2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121805111","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
h-Fair: Asymptotic Scheduling of Heavy Workloads in Heterogeneous Data Centers h-Fair:异构数据中心大负载的渐近调度
A. Postoaca, Florin Pop, R. Prodan
Large scale computing solutions are increasingly used in the context of Big Data platforms, where efficient scheduling algorithms play an important role in providing optimized cluster resource utilization, throughput and fairness. This paper deals with the problem of scheduling a set of jobs across a cluster of machines handling the specific use case of fair scheduling for jobs and machines with heterogeneous characteristics. Although job and cluster diversity is unprecedented, most schedulers do not provide implementations that handle multiple resource type fairness in a heterogeneous system. We propose in this paper a new scheduler called h-Fair that selects jobs for scheduling based on a global dominant resource fairness heterogeneous policy, and dispatches them on machines with similar characteristics to the resource demands using the cosine similarity. We implemented h-Fair in Apache Hadoop YARN and we compare it with the existing Fair Scheduler that uses the dominant resource fairness policy based on the Google workload trace. We show that our implementation provides better cluster resource utilization and allocates more containers when jobs and machines have heterogeneous characteristics.
大规模计算解决方案越来越多地应用于大数据平台,高效的调度算法在优化集群资源利用率、吞吐量和公平性方面发挥着重要作用。本文研究了跨机器集群调度一组作业的问题,处理了对具有异构特征的作业和机器进行公平调度的具体用例。尽管作业和集群的多样性是前所未有的,但大多数调度器没有提供在异构系统中处理多种资源类型公平性的实现。本文提出了一种新的调度程序h-Fair,它基于全局优势资源公平异构策略选择调度任务,并利用余弦相似度将任务分配到与资源需求特征相似的机器上。我们在Apache Hadoop YARN中实现了h-Fair,并将其与现有的Fair Scheduler进行了比较,后者使用基于Google工作负载跟踪的主导资源公平策略。我们展示了我们的实现提供了更好的集群资源利用率,并在作业和机器具有异构特征时分配了更多的容器。
{"title":"h-Fair: Asymptotic Scheduling of Heavy Workloads in Heterogeneous Data Centers","authors":"A. Postoaca, Florin Pop, R. Prodan","doi":"10.1109/CCGRID.2018.00058","DOIUrl":"https://doi.org/10.1109/CCGRID.2018.00058","url":null,"abstract":"Large scale computing solutions are increasingly used in the context of Big Data platforms, where efficient scheduling algorithms play an important role in providing optimized cluster resource utilization, throughput and fairness. This paper deals with the problem of scheduling a set of jobs across a cluster of machines handling the specific use case of fair scheduling for jobs and machines with heterogeneous characteristics. Although job and cluster diversity is unprecedented, most schedulers do not provide implementations that handle multiple resource type fairness in a heterogeneous system. We propose in this paper a new scheduler called h-Fair that selects jobs for scheduling based on a global dominant resource fairness heterogeneous policy, and dispatches them on machines with similar characteristics to the resource demands using the cosine similarity. We implemented h-Fair in Apache Hadoop YARN and we compare it with the existing Fair Scheduler that uses the dominant resource fairness policy based on the Google workload trace. We show that our implementation provides better cluster resource utilization and allocates more containers when jobs and machines have heterogeneous characteristics.","PeriodicalId":321027,"journal":{"name":"2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134382549","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
First Hop Mobile Offloading of DAG Computations 第一跳移动卸载的DAG计算
Vincenzo De Maio, I. Brandić
In recent years, Mobile Cloud Computing (MCC) has been proposed to increase battery lifetime of mobile devices. However, offloading on Cloud infrastructures may be infeasible for latency critical applications, due to the geographical distribution of Cloud data centers that increases offloading time. In this paper, we investigate the use of Mobile Edge Cloud Offloading (MECO), namely offloading to a heterogeneous computing infrastructure featuring both Cloud and Edge nodes, where Edge nodes are geographically closer to the mobile device. We evaluate improvements of MECO in comparison with MCC for objectives such as applications' runtime, mobile device battery lifetime and cost for the user. Afterwards, we propose the Edge Cloud Heuristic Offloading (ECHO) approach to find a trade-off solution between the aforementioned objectives, according to user's preferences. We evaluate our approach by simulating offloading of Directed Acyclic Graphs (DAGs) representing mobile applications through the use of Monte-Carlo simulations. The results show that (1) MECO can reduce application runtime by up to 70.7% and cost by up to 70.6% in comparison to MCC and (2) ECHO allows user to select a trade-off solution with at most 18% MAPE for runtime, 16% for cost and 0.5% for battery lifetime, according to user's preferences.
近年来,移动云计算(MCC)被提出,以提高移动设备的电池寿命。但是,由于云数据中心的地理分布增加了卸载时间,因此在云基础设施上卸载对于延迟关键型应用程序可能是不可行的。在本文中,我们研究了移动边缘云卸载(MECO)的使用,即卸载到具有云和边缘节点的异构计算基础设施,其中边缘节点在地理上更靠近移动设备。我们在应用程序运行时间、移动设备电池寿命和用户成本等方面对MECO与MCC的改进进行了评估。然后,我们提出了边缘云启发式卸载(ECHO)方法,根据用户的偏好在上述目标之间找到权衡解决方案。我们通过使用蒙特卡罗模拟模拟表示移动应用程序的有向无环图(dag)的卸载来评估我们的方法。结果表明:(1)与MCC相比,MECO可以将应用程序运行时间减少70.7%,成本减少70.6%;(2)ECHO允许用户根据用户的喜好选择一个折衷解决方案,运行时间MAPE最多为18%,成本为16%,电池寿命为0.5%。
{"title":"First Hop Mobile Offloading of DAG Computations","authors":"Vincenzo De Maio, I. Brandić","doi":"10.1109/CCGRID.2018.00023","DOIUrl":"https://doi.org/10.1109/CCGRID.2018.00023","url":null,"abstract":"In recent years, Mobile Cloud Computing (MCC) has been proposed to increase battery lifetime of mobile devices. However, offloading on Cloud infrastructures may be infeasible for latency critical applications, due to the geographical distribution of Cloud data centers that increases offloading time. In this paper, we investigate the use of Mobile Edge Cloud Offloading (MECO), namely offloading to a heterogeneous computing infrastructure featuring both Cloud and Edge nodes, where Edge nodes are geographically closer to the mobile device. We evaluate improvements of MECO in comparison with MCC for objectives such as applications' runtime, mobile device battery lifetime and cost for the user. Afterwards, we propose the Edge Cloud Heuristic Offloading (ECHO) approach to find a trade-off solution between the aforementioned objectives, according to user's preferences. We evaluate our approach by simulating offloading of Directed Acyclic Graphs (DAGs) representing mobile applications through the use of Monte-Carlo simulations. The results show that (1) MECO can reduce application runtime by up to 70.7% and cost by up to 70.6% in comparison to MCC and (2) ECHO allows user to select a trade-off solution with at most 18% MAPE for runtime, 16% for cost and 0.5% for battery lifetime, according to user's preferences.","PeriodicalId":321027,"journal":{"name":"2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123703111","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 52
RaaS: Resilience as a Service RaaS:弹性即服务
Jorge Villamayor, Dolores Rexachs, E. Luque, D. Lugones
Cloud computing is continuously increasing its popularity as key features such as scalability, pay-per-use and availability continue to evolve. It is also becoming a competitive platform for running high performance computing (HPC) and parallel applications due to the increasing performance of virtualized, highly-available instances. However, migrating HPC applications to cloud still requires native fault-tolerant solutions to fully leverage cloud features and maximize the resource utilization at the best cost – particularly for long-running parallel applications where faults can cause invalid states or data loss. This requires re-executing applications which increases completion time and cost. We propose Resilience as a Service (RaaS), a fault tolerant framework for HPC applications running in cloud. In this paper RADIC architecture (Redundant Array of Distributed Independent Fault Tolerance Controllers) is used to provide clouds with a highly available, distributed and scalable fault-tolerant service. The paper explores how traditional HPC protection and recovery mechanisms must be redesigned to natively leverage cloud properties and its multiple alternatives for implementing rollback recovery protocols using virtual machines, containers, object and block storage or database services. Results show that RaaS restores and completes the application execution using available resources while reducing overhead up to 8% for different fault-tolerant configuration alternatives.
随着可伸缩性、按使用付费和可用性等关键特性的不断发展,云计算的受欢迎程度也在不断提高。由于虚拟化、高可用性实例的性能不断提高,它也正在成为运行高性能计算(HPC)和并行应用程序的有竞争力的平台。然而,将HPC应用程序迁移到云仍然需要本地容错解决方案,以充分利用云特性并以最佳成本最大化资源利用率——特别是对于长时间运行的并行应用程序,其中错误可能导致无效状态或数据丢失。这需要重新执行应用程序,这增加了完成时间和成本。我们提出弹性即服务(RaaS),这是一个用于运行在云中的高性能计算应用程序的容错框架。本文采用分布式独立容错控制器冗余阵列(Redundant Array of Distributed Independent Fault Tolerance Controllers, RADIC)架构为云提供高可用、分布式、可扩展的容错服务。本文探讨了必须如何重新设计传统的HPC保护和恢复机制,以原生地利用云属性及其使用虚拟机、容器、对象和块存储或数据库服务实现回滚恢复协议的多种替代方案。结果表明,RaaS使用可用资源恢复并完成应用程序的执行,同时对于不同的容错配置备选方案,最多可减少8%的开销。
{"title":"RaaS: Resilience as a Service","authors":"Jorge Villamayor, Dolores Rexachs, E. Luque, D. Lugones","doi":"10.1109/CCGRID.2018.00055","DOIUrl":"https://doi.org/10.1109/CCGRID.2018.00055","url":null,"abstract":"Cloud computing is continuously increasing its popularity as key features such as scalability, pay-per-use and availability continue to evolve. It is also becoming a competitive platform for running high performance computing (HPC) and parallel applications due to the increasing performance of virtualized, highly-available instances. However, migrating HPC applications to cloud still requires native fault-tolerant solutions to fully leverage cloud features and maximize the resource utilization at the best cost – particularly for long-running parallel applications where faults can cause invalid states or data loss. This requires re-executing applications which increases completion time and cost. We propose Resilience as a Service (RaaS), a fault tolerant framework for HPC applications running in cloud. In this paper RADIC architecture (Redundant Array of Distributed Independent Fault Tolerance Controllers) is used to provide clouds with a highly available, distributed and scalable fault-tolerant service. The paper explores how traditional HPC protection and recovery mechanisms must be redesigned to natively leverage cloud properties and its multiple alternatives for implementing rollback recovery protocols using virtual machines, containers, object and block storage or database services. Results show that RaaS restores and completes the application execution using available resources while reducing overhead up to 8% for different fault-tolerant configuration alternatives.","PeriodicalId":321027,"journal":{"name":"2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125071293","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
bQueue: A Coarse-Grained Bucket QoS Scheduler bQueue:粗粒度桶QoS调度器
Yuhan Peng, P. Varman
We consider the problem of providing QoS guarantees in a clustered storage system whose data is distributed over multiple server nodes. Storage objects are encapsulated in a single logical bucket and QoS is provided at the level of buckets. The service that a single bucket receives is the aggregate of the service it receives at the nodes holding its constituent objects. The service depends on individual time-varying service demands and congestion at the physical servers. In this paper, we present bQueue, a coarse-grained scheduling algorithm that provides reservation and limit QoS for buckets in a distributed storage system, using tokens to control the amount of service received at individual storage servers. bQueue uses the max-flow algorithm to periodically determine the optimal token distribution based on the demands of the buckets at different servers and the QoS parameters of the buckets. Our experimental results show that bQueue provides accurate QoS among the buckets with different access patterns, and handles runtime demand changes in a reasonable way.
我们考虑在数据分布在多个服务器节点上的集群存储系统中提供QoS保证的问题。存储对象封装在单个逻辑桶中,并在桶级别提供QoS。单个桶接收到的服务是它在持有其组成对象的节点上接收到的服务的聚合。该服务取决于单个时变服务需求和物理服务器上的拥塞情况。在本文中,我们提出了bQueue,一种粗粒度调度算法,它为分布式存储系统中的桶提供保留和限制QoS,使用令牌来控制各个存储服务器接收的服务数量。bQueue使用max-flow算法,根据不同服务器上桶的需求和桶的QoS参数,周期性地确定最优令牌分配。实验结果表明,bQueue在具有不同访问模式的桶之间提供了准确的QoS,并以合理的方式处理运行时需求的变化。
{"title":"bQueue: A Coarse-Grained Bucket QoS Scheduler","authors":"Yuhan Peng, P. Varman","doi":"10.1109/CCGRID.2018.00024","DOIUrl":"https://doi.org/10.1109/CCGRID.2018.00024","url":null,"abstract":"We consider the problem of providing QoS guarantees in a clustered storage system whose data is distributed over multiple server nodes. Storage objects are encapsulated in a single logical bucket and QoS is provided at the level of buckets. The service that a single bucket receives is the aggregate of the service it receives at the nodes holding its constituent objects. The service depends on individual time-varying service demands and congestion at the physical servers. In this paper, we present bQueue, a coarse-grained scheduling algorithm that provides reservation and limit QoS for buckets in a distributed storage system, using tokens to control the amount of service received at individual storage servers. bQueue uses the max-flow algorithm to periodically determine the optimal token distribution based on the demands of the buckets at different servers and the QoS parameters of the buckets. Our experimental results show that bQueue provides accurate QoS among the buckets with different access patterns, and handles runtime demand changes in a reasonable way.","PeriodicalId":321027,"journal":{"name":"2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129469741","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Performance Optimization of Budget-Constrained MapReduce Workflows in Multi-Clouds 预算约束MapReduce工作流在多云环境下的性能优化
Huiyan Cao, C. Wu
With the rapid deployment of cloud infrastructures around the globe and the economic benefit of cloud-based computing and storage services, an increasing number of scientific workflows have been shifted or are in active transition to clouds. As the scale of scientific applications continues to grow, it is now common to deploy data-and network-intensive computing workflows across multi-clouds, where inter-cloud data transfer has a significant impact on both workflow performance and financial cost. We construct rigorous mathematical models to analyze intra-and inter-cloud execution dynamics of scientific workflows and formulate a budget-constrained workflow mapping problem to optimize the network performance of MapReduce-based scientific workflows in Hadoop systems in multi-cloud environments. We show this problem to be NP-complete and design a heuristic solution that takes into consideration module execution, data transfer, and I/O operations. The performance superiority of the proposed mapping solution over existing methods is illustrated through extensive simulations and further verified by real-life workflow experiments deployed in public clouds. We observe about 15% discrepancy between our theoretical estimates and real-world experimental measurements, which validates the correctness of our cost models and also ensures accurate workflow mapping in real systems.
随着云基础设施在全球范围内的快速部署以及基于云的计算和存储服务的经济效益,越来越多的科学工作流程已经转移或正在积极过渡到云。随着科学应用规模的持续增长,现在跨多云部署数据和网络密集型计算工作流是很常见的,其中云之间的数据传输对工作流性能和财务成本都有重大影响。构建严谨的数学模型,分析科学工作流在云内和云间的执行动态,提出预算约束下的工作流映射问题,优化Hadoop系统中基于mapreduce的科学工作流在多云环境下的网络性能。我们展示了这个问题是np完全的,并设计了一个启发式解决方案,考虑了模块执行、数据传输和I/O操作。通过大量的仿真和部署在公共云上的实际工作流程实验进一步验证了所提出的映射解决方案相对于现有方法的性能优势。我们观察到我们的理论估计与实际实验测量之间约有15%的差异,这验证了我们的成本模型的正确性,也确保了在实际系统中准确的工作流映射。
{"title":"Performance Optimization of Budget-Constrained MapReduce Workflows in Multi-Clouds","authors":"Huiyan Cao, C. Wu","doi":"10.1109/CCGRID.2018.00039","DOIUrl":"https://doi.org/10.1109/CCGRID.2018.00039","url":null,"abstract":"With the rapid deployment of cloud infrastructures around the globe and the economic benefit of cloud-based computing and storage services, an increasing number of scientific workflows have been shifted or are in active transition to clouds. As the scale of scientific applications continues to grow, it is now common to deploy data-and network-intensive computing workflows across multi-clouds, where inter-cloud data transfer has a significant impact on both workflow performance and financial cost. We construct rigorous mathematical models to analyze intra-and inter-cloud execution dynamics of scientific workflows and formulate a budget-constrained workflow mapping problem to optimize the network performance of MapReduce-based scientific workflows in Hadoop systems in multi-cloud environments. We show this problem to be NP-complete and design a heuristic solution that takes into consideration module execution, data transfer, and I/O operations. The performance superiority of the proposed mapping solution over existing methods is illustrated through extensive simulations and further verified by real-life workflow experiments deployed in public clouds. We observe about 15% discrepancy between our theoretical estimates and real-world experimental measurements, which validates the correctness of our cost models and also ensures accurate workflow mapping in real systems.","PeriodicalId":321027,"journal":{"name":"2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128729233","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Programmable Caches with a Data Management Language and Policy Engine 具有数据管理语言和策略引擎的可编程缓存
Michael Sevilla, C. Maltzahn, P. Alvaro, Reza Nasirigerdeh, B. Settlemyer, D. Perez, D. Rich, G. Shipman
Our analysis of the key-value activity generated by the ParSplice molecular dynamics simulation demonstrates the need for more complex cache management strategies. Baseline measurements show clear key access patterns and hot spots that offer significant opportunity for optimization. We use the data management language and policy engine from the Mantle system to dynamically explore a variety of techniques, ranging from basic algorithms and heuristics to statistical models, calculus, and machine learning. While Mantle was originally designed for distributed file systems, we show how the collection of abstractions effectively decomposes the problem into manageable policies for a different application and storage system. Our exploration of this space results in a dynamically sized cache policy that does not sacrifice any performance while using 32-66% less memory than the default ParSplice configuration.
我们对ParSplice分子动力学模拟生成的键值活动的分析表明,需要更复杂的缓存管理策略。基线测量显示了清晰的键访问模式和热点,为优化提供了重要的机会。我们使用来自Mantle系统的数据管理语言和策略引擎来动态探索各种技术,从基本算法和启发式到统计模型,微积分和机器学习。虽然Mantle最初是为分布式文件系统设计的,但我们将展示抽象集合如何有效地将问题分解为不同应用程序和存储系统的可管理策略。我们对这个空间的探索产生了一个动态大小的缓存策略,它在使用比默认ParSplice配置少32-66%的内存时不会牺牲任何性能。
{"title":"Programmable Caches with a Data Management Language and Policy Engine","authors":"Michael Sevilla, C. Maltzahn, P. Alvaro, Reza Nasirigerdeh, B. Settlemyer, D. Perez, D. Rich, G. Shipman","doi":"10.1109/CCGRID.2018.00035","DOIUrl":"https://doi.org/10.1109/CCGRID.2018.00035","url":null,"abstract":"Our analysis of the key-value activity generated by the ParSplice molecular dynamics simulation demonstrates the need for more complex cache management strategies. Baseline measurements show clear key access patterns and hot spots that offer significant opportunity for optimization. We use the data management language and policy engine from the Mantle system to dynamically explore a variety of techniques, ranging from basic algorithms and heuristics to statistical models, calculus, and machine learning. While Mantle was originally designed for distributed file systems, we show how the collection of abstractions effectively decomposes the problem into manageable policies for a different application and storage system. Our exploration of this space results in a dynamically sized cache policy that does not sacrifice any performance while using 32-66% less memory than the default ParSplice configuration.","PeriodicalId":321027,"journal":{"name":"2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"2011 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129173784","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Implementation of Unsupervised k-Means Clustering Algorithm Within Amazon Web Services Lambda Amazon Web Services Lambda中无监督k-Means聚类算法的实现
A. Deese
This work demonstrates how an unsupervised learning algorithm based on k-Means Clustering with Kaufman Initialization may be implemented effectively as an Amazon Web Services Lambda Function, within their serverless cloud computing service. It emphasizes the need to employ a lean and modular design philosophy, transfer data efficiently between Lambda and DynamoDB, as well as employ Lambda Functions within mobile applications seamlessly and with negligible latency. This work presents a novel application of serverless cloud computing and provides specific examples that will allow readers to develop similar algorithms. The author provides compares the computation speed and cost of machine learning implementations on traditional PC and mobile hardware (running locally) as well as implementations that employ Lambda.
这项工作演示了基于k-Means聚类和Kaufman初始化的无监督学习算法如何在他们的无服务器云计算服务中作为Amazon Web Services Lambda函数有效地实现。它强调需要采用精益和模块化的设计理念,在Lambda和DynamoDB之间有效地传输数据,以及在移动应用程序中无缝地使用Lambda函数,并且延迟可以忽略不计。这项工作提出了一种无服务器云计算的新应用,并提供了具体的示例,使读者能够开发类似的算法。作者比较了传统PC和移动硬件(本地运行)以及使用Lambda的机器学习实现的计算速度和成本。
{"title":"Implementation of Unsupervised k-Means Clustering Algorithm Within Amazon Web Services Lambda","authors":"A. Deese","doi":"10.1109/CCGRID.2018.00093","DOIUrl":"https://doi.org/10.1109/CCGRID.2018.00093","url":null,"abstract":"This work demonstrates how an unsupervised learning algorithm based on k-Means Clustering with Kaufman Initialization may be implemented effectively as an Amazon Web Services Lambda Function, within their serverless cloud computing service. It emphasizes the need to employ a lean and modular design philosophy, transfer data efficiently between Lambda and DynamoDB, as well as employ Lambda Functions within mobile applications seamlessly and with negligible latency. This work presents a novel application of serverless cloud computing and provides specific examples that will allow readers to develop similar algorithms. The author provides compares the computation speed and cost of machine learning implementations on traditional PC and mobile hardware (running locally) as well as implementations that employ Lambda.","PeriodicalId":321027,"journal":{"name":"2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116681217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
One Size Does Not Fit All: The Case for Chunking Configuration in Backup Deduplication 一种大小不适合所有:重复数据删除备份中的分块配置案例
Huijun Wu, Chen Wang, Kai Lu, Yinjin Fu, Liming Zhu
Data backup is regularly required by both enterprise and individual users to protect their data from unexpected loss. There are also various commercial data deduplication systems or software that help users to eliminate duplicates in their backup data to save storage space. In data deduplication systems, the data chunking process splits data into small chunks. Duplicate data is identified by comparing the fingerprints of the chunks. The chunk size setting has significant impact on deduplication performance. A variety of chunking algorithms have been proposed in recent studies. In practice, existing systems often set the chunking configuration in an empirical manner. A chunk size of 4KB or 8KB is regarded as the sweet spot for good deduplication performance. However, the data storage and access patterns of users vary and change along time, as a result, the empirical chunk size setting may not lead to a good deduplication ratio and sometimes results in difficulties of storage capacity planning. Moreover, it is difficult to make changes to the chunking settings once they are put into use as duplicates in data with different chunk size settings cannot be eliminated directly. In this paper, we propose a sampling-based chunking method and develop a tool named SmartChunker to estimate the optimal chunking configuration for deduplication systems. Our evaluations on real-world datasets demonstrate the efficacy and efficiency of SmartChunker.
无论是企业用户还是个人用户,都需要定期进行数据备份,以防止数据意外丢失。也有各种商业重复数据删除系统或软件可以帮助用户消除备份数据中的重复项,从而节省存储空间。在重复数据删除系统中,数据分块过程将数据分成小块。通过比较块的指纹来识别重复数据。块大小的设置对重复数据删除性能影响较大。在最近的研究中提出了各种各样的分块算法。在实践中,现有系统通常以经验的方式设置分块配置。4KB或8KB的块大小被认为是获得良好重复数据删除性能的最佳点。但是,用户的数据存储和访问模式会随着时间的变化而变化,因此,经验的块大小设置可能无法获得良好的重复数据删除比率,有时还会给存储容量规划带来困难。此外,一旦分块设置投入使用,就很难对其进行更改,因为不能直接消除不同块大小设置的数据中的重复。在本文中,我们提出了一种基于采样的分块方法,并开发了一个名为SmartChunker的工具来估计重复数据删除系统的最佳分块配置。我们对真实世界数据集的评估证明了SmartChunker的有效性和效率。
{"title":"One Size Does Not Fit All: The Case for Chunking Configuration in Backup Deduplication","authors":"Huijun Wu, Chen Wang, Kai Lu, Yinjin Fu, Liming Zhu","doi":"10.1109/CCGRID.2018.00036","DOIUrl":"https://doi.org/10.1109/CCGRID.2018.00036","url":null,"abstract":"Data backup is regularly required by both enterprise and individual users to protect their data from unexpected loss. There are also various commercial data deduplication systems or software that help users to eliminate duplicates in their backup data to save storage space. In data deduplication systems, the data chunking process splits data into small chunks. Duplicate data is identified by comparing the fingerprints of the chunks. The chunk size setting has significant impact on deduplication performance. A variety of chunking algorithms have been proposed in recent studies. In practice, existing systems often set the chunking configuration in an empirical manner. A chunk size of 4KB or 8KB is regarded as the sweet spot for good deduplication performance. However, the data storage and access patterns of users vary and change along time, as a result, the empirical chunk size setting may not lead to a good deduplication ratio and sometimes results in difficulties of storage capacity planning. Moreover, it is difficult to make changes to the chunking settings once they are put into use as duplicates in data with different chunk size settings cannot be eliminated directly. In this paper, we propose a sampling-based chunking method and develop a tool named SmartChunker to estimate the optimal chunking configuration for deduplication systems. Our evaluations on real-world datasets demonstrate the efficacy and efficiency of SmartChunker.","PeriodicalId":321027,"journal":{"name":"2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130325344","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
期刊
2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1