Pub Date : 2018-05-01DOI: 10.1109/CCGRID.2018.00086
Ying Cai, Chao Yang, Wenjing Ma, Yulong Ao
Stencil computation arises from a large variety of scientific and engineering applications and often plays a critical role in the performance of extreme-scale simulations. Due to the memory bound nature, it is a challenging task to optimize stencil computation kernels on many leadership supercomputers, such as Sunway TaihuLight, which has relatively high computing throughput whilst relatively low data-moving capability. In this white paper, we show the efforts we have been making during the past two years in developing end-to-end implementation and optimization techniques for extreme-scale stencil computations on Sunway TaihuLight. We started with a work on optimizing the 3-D 2nd-order 13-point stencil for nonhydrostatic atmospheric dynamics simulation, which is an important part of the 2016 ACM Gordon Bell Prize winning work, and extended it in ways that can handle a broader range of realistic and challenging problems, such as the HPGMG benchmark that consists of memory-hungry stencils and the gaseous wave detonation simulation that relies on complex high-order stencils. The presented stencil computation paradigm on Sunway TaihuLight includes not only multilevel parallelization to exploit the parallelism on different hardware levels, but also systematic performance optimization techniques for communication, memory access, and computation. We show by extreme-scale tests that the proposed systematic stencil computation paradigm can successfully deliver remarkable performance on Sunway TaihuLight with ten million heterogeneous cores. In particular, we achieve an aggregate performance of 23.12 Pflops for the 3-D 5th order WENO stencil computation in gaseous wave detonation simulation, which is the highest performance result for high-order stencil computations as far as we know, and an aggregate performance of solving over one trillion unknowns per second in the HPGMG benchmark, which ranks the first place in the HPGMG List of Nov 2017.
{"title":"Extreme-Scale Realistic Stencil Computations on Sunway TaihuLight with Ten Million Cores","authors":"Ying Cai, Chao Yang, Wenjing Ma, Yulong Ao","doi":"10.1109/CCGRID.2018.00086","DOIUrl":"https://doi.org/10.1109/CCGRID.2018.00086","url":null,"abstract":"Stencil computation arises from a large variety of scientific and engineering applications and often plays a critical role in the performance of extreme-scale simulations. Due to the memory bound nature, it is a challenging task to optimize stencil computation kernels on many leadership supercomputers, such as Sunway TaihuLight, which has relatively high computing throughput whilst relatively low data-moving capability. In this white paper, we show the efforts we have been making during the past two years in developing end-to-end implementation and optimization techniques for extreme-scale stencil computations on Sunway TaihuLight. We started with a work on optimizing the 3-D 2nd-order 13-point stencil for nonhydrostatic atmospheric dynamics simulation, which is an important part of the 2016 ACM Gordon Bell Prize winning work, and extended it in ways that can handle a broader range of realistic and challenging problems, such as the HPGMG benchmark that consists of memory-hungry stencils and the gaseous wave detonation simulation that relies on complex high-order stencils. The presented stencil computation paradigm on Sunway TaihuLight includes not only multilevel parallelization to exploit the parallelism on different hardware levels, but also systematic performance optimization techniques for communication, memory access, and computation. We show by extreme-scale tests that the proposed systematic stencil computation paradigm can successfully deliver remarkable performance on Sunway TaihuLight with ten million heterogeneous cores. In particular, we achieve an aggregate performance of 23.12 Pflops for the 3-D 5th order WENO stencil computation in gaseous wave detonation simulation, which is the highest performance result for high-order stencil computations as far as we know, and an aggregate performance of solving over one trillion unknowns per second in the HPGMG benchmark, which ranks the first place in the HPGMG List of Nov 2017.","PeriodicalId":321027,"journal":{"name":"2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"3 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114012809","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-05-01DOI: 10.1109/CCGRID.2018.00051
M. Turilli, André Merzky, Vivek Balasubramanian, S. Jha
We suggest there is a need for a fresh perspective on the design and development of middleware for high-performance workflows and workflow systems. We argue for a building blocks approach, outline a description of this approach and define their properties. We discuss RADICAL-Cybertools as one implementation of the building blocks concept, showing how they have been designed and developed in accordance with this approach. We discuss three case-studies where RADICAL-Cybertools have been used to develop new workflow systems capabilities and in-tegrated to enhance existing ones, illustrating the potential and promise of the building blocks approach.
{"title":"Building Blocks for Workflow System Middleware","authors":"M. Turilli, André Merzky, Vivek Balasubramanian, S. Jha","doi":"10.1109/CCGRID.2018.00051","DOIUrl":"https://doi.org/10.1109/CCGRID.2018.00051","url":null,"abstract":"We suggest there is a need for a fresh perspective on the design and development of middleware for high-performance workflows and workflow systems. We argue for a building blocks approach, outline a description of this approach and define their properties. We discuss RADICAL-Cybertools as one implementation of the building blocks concept, showing how they have been designed and developed in accordance with this approach. We discuss three case-studies where RADICAL-Cybertools have been used to develop new workflow systems capabilities and in-tegrated to enhance existing ones, illustrating the potential and promise of the building blocks approach.","PeriodicalId":321027,"journal":{"name":"2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121805111","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-05-01DOI: 10.1109/CCGRID.2018.00058
A. Postoaca, Florin Pop, R. Prodan
Large scale computing solutions are increasingly used in the context of Big Data platforms, where efficient scheduling algorithms play an important role in providing optimized cluster resource utilization, throughput and fairness. This paper deals with the problem of scheduling a set of jobs across a cluster of machines handling the specific use case of fair scheduling for jobs and machines with heterogeneous characteristics. Although job and cluster diversity is unprecedented, most schedulers do not provide implementations that handle multiple resource type fairness in a heterogeneous system. We propose in this paper a new scheduler called h-Fair that selects jobs for scheduling based on a global dominant resource fairness heterogeneous policy, and dispatches them on machines with similar characteristics to the resource demands using the cosine similarity. We implemented h-Fair in Apache Hadoop YARN and we compare it with the existing Fair Scheduler that uses the dominant resource fairness policy based on the Google workload trace. We show that our implementation provides better cluster resource utilization and allocates more containers when jobs and machines have heterogeneous characteristics.
{"title":"h-Fair: Asymptotic Scheduling of Heavy Workloads in Heterogeneous Data Centers","authors":"A. Postoaca, Florin Pop, R. Prodan","doi":"10.1109/CCGRID.2018.00058","DOIUrl":"https://doi.org/10.1109/CCGRID.2018.00058","url":null,"abstract":"Large scale computing solutions are increasingly used in the context of Big Data platforms, where efficient scheduling algorithms play an important role in providing optimized cluster resource utilization, throughput and fairness. This paper deals with the problem of scheduling a set of jobs across a cluster of machines handling the specific use case of fair scheduling for jobs and machines with heterogeneous characteristics. Although job and cluster diversity is unprecedented, most schedulers do not provide implementations that handle multiple resource type fairness in a heterogeneous system. We propose in this paper a new scheduler called h-Fair that selects jobs for scheduling based on a global dominant resource fairness heterogeneous policy, and dispatches them on machines with similar characteristics to the resource demands using the cosine similarity. We implemented h-Fair in Apache Hadoop YARN and we compare it with the existing Fair Scheduler that uses the dominant resource fairness policy based on the Google workload trace. We show that our implementation provides better cluster resource utilization and allocates more containers when jobs and machines have heterogeneous characteristics.","PeriodicalId":321027,"journal":{"name":"2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134382549","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-05-01DOI: 10.1109/CCGRID.2018.00023
Vincenzo De Maio, I. Brandić
In recent years, Mobile Cloud Computing (MCC) has been proposed to increase battery lifetime of mobile devices. However, offloading on Cloud infrastructures may be infeasible for latency critical applications, due to the geographical distribution of Cloud data centers that increases offloading time. In this paper, we investigate the use of Mobile Edge Cloud Offloading (MECO), namely offloading to a heterogeneous computing infrastructure featuring both Cloud and Edge nodes, where Edge nodes are geographically closer to the mobile device. We evaluate improvements of MECO in comparison with MCC for objectives such as applications' runtime, mobile device battery lifetime and cost for the user. Afterwards, we propose the Edge Cloud Heuristic Offloading (ECHO) approach to find a trade-off solution between the aforementioned objectives, according to user's preferences. We evaluate our approach by simulating offloading of Directed Acyclic Graphs (DAGs) representing mobile applications through the use of Monte-Carlo simulations. The results show that (1) MECO can reduce application runtime by up to 70.7% and cost by up to 70.6% in comparison to MCC and (2) ECHO allows user to select a trade-off solution with at most 18% MAPE for runtime, 16% for cost and 0.5% for battery lifetime, according to user's preferences.
{"title":"First Hop Mobile Offloading of DAG Computations","authors":"Vincenzo De Maio, I. Brandić","doi":"10.1109/CCGRID.2018.00023","DOIUrl":"https://doi.org/10.1109/CCGRID.2018.00023","url":null,"abstract":"In recent years, Mobile Cloud Computing (MCC) has been proposed to increase battery lifetime of mobile devices. However, offloading on Cloud infrastructures may be infeasible for latency critical applications, due to the geographical distribution of Cloud data centers that increases offloading time. In this paper, we investigate the use of Mobile Edge Cloud Offloading (MECO), namely offloading to a heterogeneous computing infrastructure featuring both Cloud and Edge nodes, where Edge nodes are geographically closer to the mobile device. We evaluate improvements of MECO in comparison with MCC for objectives such as applications' runtime, mobile device battery lifetime and cost for the user. Afterwards, we propose the Edge Cloud Heuristic Offloading (ECHO) approach to find a trade-off solution between the aforementioned objectives, according to user's preferences. We evaluate our approach by simulating offloading of Directed Acyclic Graphs (DAGs) representing mobile applications through the use of Monte-Carlo simulations. The results show that (1) MECO can reduce application runtime by up to 70.7% and cost by up to 70.6% in comparison to MCC and (2) ECHO allows user to select a trade-off solution with at most 18% MAPE for runtime, 16% for cost and 0.5% for battery lifetime, according to user's preferences.","PeriodicalId":321027,"journal":{"name":"2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123703111","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-05-01DOI: 10.1109/CCGRID.2018.00055
Jorge Villamayor, Dolores Rexachs, E. Luque, D. Lugones
Cloud computing is continuously increasing its popularity as key features such as scalability, pay-per-use and availability continue to evolve. It is also becoming a competitive platform for running high performance computing (HPC) and parallel applications due to the increasing performance of virtualized, highly-available instances. However, migrating HPC applications to cloud still requires native fault-tolerant solutions to fully leverage cloud features and maximize the resource utilization at the best cost – particularly for long-running parallel applications where faults can cause invalid states or data loss. This requires re-executing applications which increases completion time and cost. We propose Resilience as a Service (RaaS), a fault tolerant framework for HPC applications running in cloud. In this paper RADIC architecture (Redundant Array of Distributed Independent Fault Tolerance Controllers) is used to provide clouds with a highly available, distributed and scalable fault-tolerant service. The paper explores how traditional HPC protection and recovery mechanisms must be redesigned to natively leverage cloud properties and its multiple alternatives for implementing rollback recovery protocols using virtual machines, containers, object and block storage or database services. Results show that RaaS restores and completes the application execution using available resources while reducing overhead up to 8% for different fault-tolerant configuration alternatives.
随着可伸缩性、按使用付费和可用性等关键特性的不断发展,云计算的受欢迎程度也在不断提高。由于虚拟化、高可用性实例的性能不断提高,它也正在成为运行高性能计算(HPC)和并行应用程序的有竞争力的平台。然而,将HPC应用程序迁移到云仍然需要本地容错解决方案,以充分利用云特性并以最佳成本最大化资源利用率——特别是对于长时间运行的并行应用程序,其中错误可能导致无效状态或数据丢失。这需要重新执行应用程序,这增加了完成时间和成本。我们提出弹性即服务(RaaS),这是一个用于运行在云中的高性能计算应用程序的容错框架。本文采用分布式独立容错控制器冗余阵列(Redundant Array of Distributed Independent Fault Tolerance Controllers, RADIC)架构为云提供高可用、分布式、可扩展的容错服务。本文探讨了必须如何重新设计传统的HPC保护和恢复机制,以原生地利用云属性及其使用虚拟机、容器、对象和块存储或数据库服务实现回滚恢复协议的多种替代方案。结果表明,RaaS使用可用资源恢复并完成应用程序的执行,同时对于不同的容错配置备选方案,最多可减少8%的开销。
{"title":"RaaS: Resilience as a Service","authors":"Jorge Villamayor, Dolores Rexachs, E. Luque, D. Lugones","doi":"10.1109/CCGRID.2018.00055","DOIUrl":"https://doi.org/10.1109/CCGRID.2018.00055","url":null,"abstract":"Cloud computing is continuously increasing its popularity as key features such as scalability, pay-per-use and availability continue to evolve. It is also becoming a competitive platform for running high performance computing (HPC) and parallel applications due to the increasing performance of virtualized, highly-available instances. However, migrating HPC applications to cloud still requires native fault-tolerant solutions to fully leverage cloud features and maximize the resource utilization at the best cost – particularly for long-running parallel applications where faults can cause invalid states or data loss. This requires re-executing applications which increases completion time and cost. We propose Resilience as a Service (RaaS), a fault tolerant framework for HPC applications running in cloud. In this paper RADIC architecture (Redundant Array of Distributed Independent Fault Tolerance Controllers) is used to provide clouds with a highly available, distributed and scalable fault-tolerant service. The paper explores how traditional HPC protection and recovery mechanisms must be redesigned to natively leverage cloud properties and its multiple alternatives for implementing rollback recovery protocols using virtual machines, containers, object and block storage or database services. Results show that RaaS restores and completes the application execution using available resources while reducing overhead up to 8% for different fault-tolerant configuration alternatives.","PeriodicalId":321027,"journal":{"name":"2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125071293","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-05-01DOI: 10.1109/CCGRID.2018.00024
Yuhan Peng, P. Varman
We consider the problem of providing QoS guarantees in a clustered storage system whose data is distributed over multiple server nodes. Storage objects are encapsulated in a single logical bucket and QoS is provided at the level of buckets. The service that a single bucket receives is the aggregate of the service it receives at the nodes holding its constituent objects. The service depends on individual time-varying service demands and congestion at the physical servers. In this paper, we present bQueue, a coarse-grained scheduling algorithm that provides reservation and limit QoS for buckets in a distributed storage system, using tokens to control the amount of service received at individual storage servers. bQueue uses the max-flow algorithm to periodically determine the optimal token distribution based on the demands of the buckets at different servers and the QoS parameters of the buckets. Our experimental results show that bQueue provides accurate QoS among the buckets with different access patterns, and handles runtime demand changes in a reasonable way.
{"title":"bQueue: A Coarse-Grained Bucket QoS Scheduler","authors":"Yuhan Peng, P. Varman","doi":"10.1109/CCGRID.2018.00024","DOIUrl":"https://doi.org/10.1109/CCGRID.2018.00024","url":null,"abstract":"We consider the problem of providing QoS guarantees in a clustered storage system whose data is distributed over multiple server nodes. Storage objects are encapsulated in a single logical bucket and QoS is provided at the level of buckets. The service that a single bucket receives is the aggregate of the service it receives at the nodes holding its constituent objects. The service depends on individual time-varying service demands and congestion at the physical servers. In this paper, we present bQueue, a coarse-grained scheduling algorithm that provides reservation and limit QoS for buckets in a distributed storage system, using tokens to control the amount of service received at individual storage servers. bQueue uses the max-flow algorithm to periodically determine the optimal token distribution based on the demands of the buckets at different servers and the QoS parameters of the buckets. Our experimental results show that bQueue provides accurate QoS among the buckets with different access patterns, and handles runtime demand changes in a reasonable way.","PeriodicalId":321027,"journal":{"name":"2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129469741","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-05-01DOI: 10.1109/CCGRID.2018.00039
Huiyan Cao, C. Wu
With the rapid deployment of cloud infrastructures around the globe and the economic benefit of cloud-based computing and storage services, an increasing number of scientific workflows have been shifted or are in active transition to clouds. As the scale of scientific applications continues to grow, it is now common to deploy data-and network-intensive computing workflows across multi-clouds, where inter-cloud data transfer has a significant impact on both workflow performance and financial cost. We construct rigorous mathematical models to analyze intra-and inter-cloud execution dynamics of scientific workflows and formulate a budget-constrained workflow mapping problem to optimize the network performance of MapReduce-based scientific workflows in Hadoop systems in multi-cloud environments. We show this problem to be NP-complete and design a heuristic solution that takes into consideration module execution, data transfer, and I/O operations. The performance superiority of the proposed mapping solution over existing methods is illustrated through extensive simulations and further verified by real-life workflow experiments deployed in public clouds. We observe about 15% discrepancy between our theoretical estimates and real-world experimental measurements, which validates the correctness of our cost models and also ensures accurate workflow mapping in real systems.
{"title":"Performance Optimization of Budget-Constrained MapReduce Workflows in Multi-Clouds","authors":"Huiyan Cao, C. Wu","doi":"10.1109/CCGRID.2018.00039","DOIUrl":"https://doi.org/10.1109/CCGRID.2018.00039","url":null,"abstract":"With the rapid deployment of cloud infrastructures around the globe and the economic benefit of cloud-based computing and storage services, an increasing number of scientific workflows have been shifted or are in active transition to clouds. As the scale of scientific applications continues to grow, it is now common to deploy data-and network-intensive computing workflows across multi-clouds, where inter-cloud data transfer has a significant impact on both workflow performance and financial cost. We construct rigorous mathematical models to analyze intra-and inter-cloud execution dynamics of scientific workflows and formulate a budget-constrained workflow mapping problem to optimize the network performance of MapReduce-based scientific workflows in Hadoop systems in multi-cloud environments. We show this problem to be NP-complete and design a heuristic solution that takes into consideration module execution, data transfer, and I/O operations. The performance superiority of the proposed mapping solution over existing methods is illustrated through extensive simulations and further verified by real-life workflow experiments deployed in public clouds. We observe about 15% discrepancy between our theoretical estimates and real-world experimental measurements, which validates the correctness of our cost models and also ensures accurate workflow mapping in real systems.","PeriodicalId":321027,"journal":{"name":"2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128729233","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-05-01DOI: 10.1109/CCGRID.2018.00035
Michael Sevilla, C. Maltzahn, P. Alvaro, Reza Nasirigerdeh, B. Settlemyer, D. Perez, D. Rich, G. Shipman
Our analysis of the key-value activity generated by the ParSplice molecular dynamics simulation demonstrates the need for more complex cache management strategies. Baseline measurements show clear key access patterns and hot spots that offer significant opportunity for optimization. We use the data management language and policy engine from the Mantle system to dynamically explore a variety of techniques, ranging from basic algorithms and heuristics to statistical models, calculus, and machine learning. While Mantle was originally designed for distributed file systems, we show how the collection of abstractions effectively decomposes the problem into manageable policies for a different application and storage system. Our exploration of this space results in a dynamically sized cache policy that does not sacrifice any performance while using 32-66% less memory than the default ParSplice configuration.
{"title":"Programmable Caches with a Data Management Language and Policy Engine","authors":"Michael Sevilla, C. Maltzahn, P. Alvaro, Reza Nasirigerdeh, B. Settlemyer, D. Perez, D. Rich, G. Shipman","doi":"10.1109/CCGRID.2018.00035","DOIUrl":"https://doi.org/10.1109/CCGRID.2018.00035","url":null,"abstract":"Our analysis of the key-value activity generated by the ParSplice molecular dynamics simulation demonstrates the need for more complex cache management strategies. Baseline measurements show clear key access patterns and hot spots that offer significant opportunity for optimization. We use the data management language and policy engine from the Mantle system to dynamically explore a variety of techniques, ranging from basic algorithms and heuristics to statistical models, calculus, and machine learning. While Mantle was originally designed for distributed file systems, we show how the collection of abstractions effectively decomposes the problem into manageable policies for a different application and storage system. Our exploration of this space results in a dynamically sized cache policy that does not sacrifice any performance while using 32-66% less memory than the default ParSplice configuration.","PeriodicalId":321027,"journal":{"name":"2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"2011 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129173784","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-05-01DOI: 10.1109/CCGRID.2018.00093
A. Deese
This work demonstrates how an unsupervised learning algorithm based on k-Means Clustering with Kaufman Initialization may be implemented effectively as an Amazon Web Services Lambda Function, within their serverless cloud computing service. It emphasizes the need to employ a lean and modular design philosophy, transfer data efficiently between Lambda and DynamoDB, as well as employ Lambda Functions within mobile applications seamlessly and with negligible latency. This work presents a novel application of serverless cloud computing and provides specific examples that will allow readers to develop similar algorithms. The author provides compares the computation speed and cost of machine learning implementations on traditional PC and mobile hardware (running locally) as well as implementations that employ Lambda.
这项工作演示了基于k-Means聚类和Kaufman初始化的无监督学习算法如何在他们的无服务器云计算服务中作为Amazon Web Services Lambda函数有效地实现。它强调需要采用精益和模块化的设计理念,在Lambda和DynamoDB之间有效地传输数据,以及在移动应用程序中无缝地使用Lambda函数,并且延迟可以忽略不计。这项工作提出了一种无服务器云计算的新应用,并提供了具体的示例,使读者能够开发类似的算法。作者比较了传统PC和移动硬件(本地运行)以及使用Lambda的机器学习实现的计算速度和成本。
{"title":"Implementation of Unsupervised k-Means Clustering Algorithm Within Amazon Web Services Lambda","authors":"A. Deese","doi":"10.1109/CCGRID.2018.00093","DOIUrl":"https://doi.org/10.1109/CCGRID.2018.00093","url":null,"abstract":"This work demonstrates how an unsupervised learning algorithm based on k-Means Clustering with Kaufman Initialization may be implemented effectively as an Amazon Web Services Lambda Function, within their serverless cloud computing service. It emphasizes the need to employ a lean and modular design philosophy, transfer data efficiently between Lambda and DynamoDB, as well as employ Lambda Functions within mobile applications seamlessly and with negligible latency. This work presents a novel application of serverless cloud computing and provides specific examples that will allow readers to develop similar algorithms. The author provides compares the computation speed and cost of machine learning implementations on traditional PC and mobile hardware (running locally) as well as implementations that employ Lambda.","PeriodicalId":321027,"journal":{"name":"2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116681217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-05-01DOI: 10.1109/CCGRID.2018.00036
Huijun Wu, Chen Wang, Kai Lu, Yinjin Fu, Liming Zhu
Data backup is regularly required by both enterprise and individual users to protect their data from unexpected loss. There are also various commercial data deduplication systems or software that help users to eliminate duplicates in their backup data to save storage space. In data deduplication systems, the data chunking process splits data into small chunks. Duplicate data is identified by comparing the fingerprints of the chunks. The chunk size setting has significant impact on deduplication performance. A variety of chunking algorithms have been proposed in recent studies. In practice, existing systems often set the chunking configuration in an empirical manner. A chunk size of 4KB or 8KB is regarded as the sweet spot for good deduplication performance. However, the data storage and access patterns of users vary and change along time, as a result, the empirical chunk size setting may not lead to a good deduplication ratio and sometimes results in difficulties of storage capacity planning. Moreover, it is difficult to make changes to the chunking settings once they are put into use as duplicates in data with different chunk size settings cannot be eliminated directly. In this paper, we propose a sampling-based chunking method and develop a tool named SmartChunker to estimate the optimal chunking configuration for deduplication systems. Our evaluations on real-world datasets demonstrate the efficacy and efficiency of SmartChunker.
{"title":"One Size Does Not Fit All: The Case for Chunking Configuration in Backup Deduplication","authors":"Huijun Wu, Chen Wang, Kai Lu, Yinjin Fu, Liming Zhu","doi":"10.1109/CCGRID.2018.00036","DOIUrl":"https://doi.org/10.1109/CCGRID.2018.00036","url":null,"abstract":"Data backup is regularly required by both enterprise and individual users to protect their data from unexpected loss. There are also various commercial data deduplication systems or software that help users to eliminate duplicates in their backup data to save storage space. In data deduplication systems, the data chunking process splits data into small chunks. Duplicate data is identified by comparing the fingerprints of the chunks. The chunk size setting has significant impact on deduplication performance. A variety of chunking algorithms have been proposed in recent studies. In practice, existing systems often set the chunking configuration in an empirical manner. A chunk size of 4KB or 8KB is regarded as the sweet spot for good deduplication performance. However, the data storage and access patterns of users vary and change along time, as a result, the empirical chunk size setting may not lead to a good deduplication ratio and sometimes results in difficulties of storage capacity planning. Moreover, it is difficult to make changes to the chunking settings once they are put into use as duplicates in data with different chunk size settings cannot be eliminated directly. In this paper, we propose a sampling-based chunking method and develop a tool named SmartChunker to estimate the optimal chunking configuration for deduplication systems. Our evaluations on real-world datasets demonstrate the efficacy and efficiency of SmartChunker.","PeriodicalId":321027,"journal":{"name":"2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130325344","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}