首页 > 最新文献

2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)最新文献

英文 中文
Graph-Oriented Code Transformation Approach for Register-Limited Stencils on GPUs gpu上有限寄存器模板的面向图形的代码转换方法
Mengyao Jin, H. Fu, Zihong Lv, Guangwen Yang
Stencil kernels play an important role in many scientific and engineering disciplines. With the development of numerical algorithms and the increasing requirements of accuracy, register-limited stencils containing massive variables and operations are widely used. However, these register-limited stencils consume vast resources when executing on GPUs. The excessive use of registers reduces the number of active threads dramatically, and consequently leads to a serious performance decline. To improve the performance of these register-limited stencils, we propose a DDG (data-dependency-graph) oriented code transformation approach in this paper. By analyzing, deleting and transforming the original stencil program on GPUs, our graph-oriented code transformation approach explores for the best trade-off between the calculation amount and the parallelism degree, and further achieves better performance. The graph-oriented code transformation approach is evaluated using the Weighted Nearly Analytic Discrete stencil, and the experimental result shows that a speedup of 2.16X can be achieved when compared with the original fairly-optimized implementation. To the best of our knowledge, our study takes the first step towards balancing the calculation amount and parallelism degree of the extremely register-limited stencils on GPUs.
模板核在许多科学和工程学科中起着重要的作用。随着数值算法的发展和精度要求的提高,包含大量变量和运算的限寄存器模板得到了广泛的应用。然而,这些寄存器有限的模板在gpu上执行时会消耗大量的资源。过度使用寄存器会大大减少活动线程的数量,从而导致严重的性能下降。为了提高这些寄存器受限模板的性能,本文提出了一种面向DDG(数据依赖图)的代码转换方法。本文提出的面向图形的代码转换方法通过分析、删除和转换gpu上的原始模板程序,探索了计算量和并行度之间的最佳权衡,从而获得更好的性能。使用加权近解析离散模板对面向图形的代码转换方法进行了评估,实验结果表明,与原始的优化实现相比,面向图形的代码转换方法的速度提高了2.16倍。据我们所知,我们的研究在平衡gpu上极度寄存器限制的模板的计算量和并行度方面迈出了第一步。
{"title":"Graph-Oriented Code Transformation Approach for Register-Limited Stencils on GPUs","authors":"Mengyao Jin, H. Fu, Zihong Lv, Guangwen Yang","doi":"10.1109/CCGrid.2016.13","DOIUrl":"https://doi.org/10.1109/CCGrid.2016.13","url":null,"abstract":"Stencil kernels play an important role in many scientific and engineering disciplines. With the development of numerical algorithms and the increasing requirements of accuracy, register-limited stencils containing massive variables and operations are widely used. However, these register-limited stencils consume vast resources when executing on GPUs. The excessive use of registers reduces the number of active threads dramatically, and consequently leads to a serious performance decline. To improve the performance of these register-limited stencils, we propose a DDG (data-dependency-graph) oriented code transformation approach in this paper. By analyzing, deleting and transforming the original stencil program on GPUs, our graph-oriented code transformation approach explores for the best trade-off between the calculation amount and the parallelism degree, and further achieves better performance. The graph-oriented code transformation approach is evaluated using the Weighted Nearly Analytic Discrete stencil, and the experimental result shows that a speedup of 2.16X can be achieved when compared with the original fairly-optimized implementation. To the best of our knowledge, our study takes the first step towards balancing the calculation amount and parallelism degree of the extremely register-limited stencils on GPUs.","PeriodicalId":103641,"journal":{"name":"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127449263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
sAXI: A High-Efficient Hardware Inter-Node Link in ARM Server for Remote Memory Access 面向远程内存访问的ARM服务器高效硬件节点间链路
Ke Zhang, Yisong Chang, Lixin Zhang, Mingyu Chen, Lei Yu, Zhiwei Xu
The ever-growing need for fast big-data operations has made in-memory processing increasingly important in modern datacenters. To mitigate the capacity limitation of a single server node, techniques of inner-rack cross-node memory access have drawn attention recently. However, existing proposals exhibit inefficiency in remote memory access among server nodes due to inter-protocol conversions and non-transparent coarse-grained accesses. In this study, we propose the high-performance and efficient serialized AXI (sAXI) link and its associated cross-node memory access mechanism for emerging ARM-based servers. The key idea behind sAXI is directly extending the on-chip AMBA AXI-4.0 interconnection of the SoC in a local server node to the outside, and then bringing into remote server nodes via high-speed serial lanes. As a result, natively accessing remote memory in adjacent nodes in the same manner of local assets is supported by purely using existing software. Experimental results show that, using the sAXI data-path, performance of remote memory access in the user-level micro-benchmark is very promising (min. latency: 1.16μs, max. bandwidth: 1.52GB/s on our in-house FPGA prototype). In addition, through this efficient hardware inter-node link, performance of an in-memory key-value framework, Redis, can be improved up to 1.72x and large latency overhead of database query can be effectively hidden.
对快速大数据操作的需求不断增长,使得内存处理在现代数据中心中变得越来越重要。为了减轻单个服务器节点的容量限制,机架内跨节点内存访问技术近年来引起了人们的关注。然而,由于协议间转换和非透明的粗粒度访问,现有的建议在服务器节点之间的远程内存访问中表现出低效率。在这项研究中,我们提出了高性能和高效的串行AXI (sAXI)链路及其相关的跨节点内存访问机制,用于新兴的基于arm的服务器。sAXI背后的关键思想是直接将本地服务器节点中SoC的片上AMBA axis -4.0互连扩展到外部,然后通过高速串行通道带入远程服务器节点。因此,仅使用现有软件就可以支持以与本地资产相同的方式本地访问相邻节点中的远程内存。实验结果表明,使用sAXI数据路径,在用户级微基准测试中远程内存访问的性能是非常有希望的(最小延迟:1.16μs,最大延迟:1.16μs)。带宽:1.52GB/s在我们内部的FPGA原型)。此外,通过这种高效的硬件节点间链接,内存中的键值框架Redis的性能可以提高1.72倍,并且可以有效地隐藏数据库查询的大延迟开销。
{"title":"sAXI: A High-Efficient Hardware Inter-Node Link in ARM Server for Remote Memory Access","authors":"Ke Zhang, Yisong Chang, Lixin Zhang, Mingyu Chen, Lei Yu, Zhiwei Xu","doi":"10.1109/CCGrid.2016.66","DOIUrl":"https://doi.org/10.1109/CCGrid.2016.66","url":null,"abstract":"The ever-growing need for fast big-data operations has made in-memory processing increasingly important in modern datacenters. To mitigate the capacity limitation of a single server node, techniques of inner-rack cross-node memory access have drawn attention recently. However, existing proposals exhibit inefficiency in remote memory access among server nodes due to inter-protocol conversions and non-transparent coarse-grained accesses. In this study, we propose the high-performance and efficient serialized AXI (sAXI) link and its associated cross-node memory access mechanism for emerging ARM-based servers. The key idea behind sAXI is directly extending the on-chip AMBA AXI-4.0 interconnection of the SoC in a local server node to the outside, and then bringing into remote server nodes via high-speed serial lanes. As a result, natively accessing remote memory in adjacent nodes in the same manner of local assets is supported by purely using existing software. Experimental results show that, using the sAXI data-path, performance of remote memory access in the user-level micro-benchmark is very promising (min. latency: 1.16μs, max. bandwidth: 1.52GB/s on our in-house FPGA prototype). In addition, through this efficient hardware inter-node link, performance of an in-memory key-value framework, Redis, can be improved up to 1.72x and large latency overhead of database query can be effectively hidden.","PeriodicalId":103641,"journal":{"name":"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126112323","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Towards Understanding Job Heterogeneity in HPC: A NERSC Case Study 理解高性能计算中的工作异质性:NERSC案例研究
G. P. R. Álvarez, Per-Olov Östberg, E. Elmroth, K. Antypas, R. Gerber, L. Ramakrishnan
The high performance computing (HPC) scheduling landscape is changing. Increasingly, there are large scientific computations that include high-throughput, data-intensive, and stream-processing compute models. These jobs increase the workload heterogeneity, which presents challenges for classical tightly coupled MPI job oriented HPC schedulers. Thus, it is important to define new analyses methods to understand the heterogeneity of the workload, and its possible effect on the performance of current systems. In this paper, we present a methodology to assess the job heterogeneity in workloads and scheduling queues. We apply the method on the workloads of three current National Energy Research Scientific Computing Center (NERSC) systems in 2014. Finally, we present the results of such analysis, with an observation that heterogeneity might reduce predictability in the jobs' wait time.
高性能计算(HPC)调度格局正在发生变化。越来越多的大型科学计算包括高吞吐量、数据密集型和流处理计算模型。这些作业增加了工作负载的异构性,这对传统的面向紧耦合MPI作业的HPC调度器提出了挑战。因此,定义新的分析方法来理解工作负载的异质性及其对当前系统性能的可能影响是很重要的。在本文中,我们提出了一种评估工作负载和调度队列中的作业异质性的方法。我们将该方法应用于2014年国家能源研究科学计算中心(NERSC)三个现有系统的工作负荷。最后,我们提出了这样的分析结果,并观察到异质性可能会降低工作等待时间的可预测性。
{"title":"Towards Understanding Job Heterogeneity in HPC: A NERSC Case Study","authors":"G. P. R. Álvarez, Per-Olov Östberg, E. Elmroth, K. Antypas, R. Gerber, L. Ramakrishnan","doi":"10.1109/CCGrid.2016.32","DOIUrl":"https://doi.org/10.1109/CCGrid.2016.32","url":null,"abstract":"The high performance computing (HPC) scheduling landscape is changing. Increasingly, there are large scientific computations that include high-throughput, data-intensive, and stream-processing compute models. These jobs increase the workload heterogeneity, which presents challenges for classical tightly coupled MPI job oriented HPC schedulers. Thus, it is important to define new analyses methods to understand the heterogeneity of the workload, and its possible effect on the performance of current systems. In this paper, we present a methodology to assess the job heterogeneity in workloads and scheduling queues. We apply the method on the workloads of three current National Energy Research Scientific Computing Center (NERSC) systems in 2014. Finally, we present the results of such analysis, with an observation that heterogeneity might reduce predictability in the jobs' wait time.","PeriodicalId":103641,"journal":{"name":"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125315583","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
The Latin American Giant Observatory: A Successful Collaboration in Latin America Based on Cosmic Rays and Computer Science Domains 拉丁美洲巨型天文台:基于宇宙射线和计算机科学领域的拉丁美洲成功合作
Hernán Asorey, L. Núñez, M. Suárez-Durán, L. Torres-Niño, M. Pascual, A. J. Rubio-Montero, R. Mayo-García
In this work the strategy of the Latin American Giant Observatory (LAGO) to build a Latin American collaboration is presented. Installing Cosmic Rays detectors settled all around the Continent, from Mexico to the Antarctica, this collaboration is forming a community that embraces both high energy physicist and computer scientists. This is so because the data that are measured must be analytical processed and due to the fact that a priori and a posteriori simulations representing the effects of the radiation must be performed. To perform the calculi, customized codes have been implemented by the collaboration. With regard to the huge amount of data emerging from this network of sensors and from the computational simulations performed in a diversity of computing architectures and e-infrastructures, an effort is being carried out to catalog and preserve a vast amount of data produced by the water-Cherenkov Detector network and the complete LAGO simulation workflow that characterize each site. Metadata, Permanent Identifiers and the facilities from the LAGO Data Repository are described in this work jointly with the simulation codes used. These initiatives allow researchers to produce and find data and to directly use them in a code running by means of a Science Gateway that provides access to different clusters, Grid and Cloud infrastructures worldwide.
在这项工作中,提出了拉丁美洲巨人天文台(LAGO)建立拉丁美洲合作的战略。从墨西哥到南极洲,整个欧洲大陆都安装了宇宙射线探测器,这种合作正在形成一个包括高能物理学家和计算机科学家的社区。这是因为所测量的数据必须经过分析处理,而且还因为必须进行代表辐射影响的先验和后验模拟。为了执行演算,协作实现了自定义代码。考虑到从传感器网络中产生的大量数据,以及在各种计算架构和电子基础设施中进行的计算模拟,正在努力对水-切伦科夫探测器网络产生的大量数据进行分类和保存,以及描述每个站点特征的完整LAGO模拟工作流程。元数据、永久标识符和来自LAGO数据存储库的设施在本工作中与所使用的模拟代码一起进行了描述。这些举措允许研究人员生成和查找数据,并通过科学网关直接在代码中使用它们,该网关提供对全球不同集群、网格和云基础设施的访问。
{"title":"The Latin American Giant Observatory: A Successful Collaboration in Latin America Based on Cosmic Rays and Computer Science Domains","authors":"Hernán Asorey, L. Núñez, M. Suárez-Durán, L. Torres-Niño, M. Pascual, A. J. Rubio-Montero, R. Mayo-García","doi":"10.1109/CCGrid.2016.110","DOIUrl":"https://doi.org/10.1109/CCGrid.2016.110","url":null,"abstract":"In this work the strategy of the Latin American Giant Observatory (LAGO) to build a Latin American collaboration is presented. Installing Cosmic Rays detectors settled all around the Continent, from Mexico to the Antarctica, this collaboration is forming a community that embraces both high energy physicist and computer scientists. This is so because the data that are measured must be analytical processed and due to the fact that a priori and a posteriori simulations representing the effects of the radiation must be performed. To perform the calculi, customized codes have been implemented by the collaboration. With regard to the huge amount of data emerging from this network of sensors and from the computational simulations performed in a diversity of computing architectures and e-infrastructures, an effort is being carried out to catalog and preserve a vast amount of data produced by the water-Cherenkov Detector network and the complete LAGO simulation workflow that characterize each site. Metadata, Permanent Identifiers and the facilities from the LAGO Data Repository are described in this work jointly with the simulation codes used. These initiatives allow researchers to produce and find data and to directly use them in a code running by means of a Science Gateway that provides access to different clusters, Grid and Cloud infrastructures worldwide.","PeriodicalId":103641,"journal":{"name":"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129517002","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Generalized GPU Acceleration for Applications Employing Finite-Volume Methods 应用有限体积方法的通用GPU加速
Jingheng Xu, H. Fu, L. Gan, Chao Yang, Wei Xue, Shizhen Xu, Wenlai Zhao, Xinliang Wang, Bingwei Chen, Guangwen Yang
Scientific HPC applications are increasingly ported to GPUs to benefit from both the high throughput and the powerful computing capacity. Many of these applications, such as atmospheric modeling and hydraulic erosion simulation, are adopting the finite volume method (FVM) as the solver algorithm. However, the communication components inside these applications generally lead to a low flop-to-byte ratio and an inefficient utilization of GPU resources. This paper aims at optimizing FVM solver based on the structured mesh. Besides a high-level overview of the finite-volume method as well as its basic optimizations on modern GPU platforms, we further present two generalized tuning techniques including an explicit cache mechanism as well as an inner-thread rescheduling method that tries to achieve a suitable mapping between the algorithm feature and the platform architecture. To the end, we demonstrate the impact of our generalized optimization methods in two typical atmospheric dynamic kernels (Euler and SWE) based on four mainstream GPU platforms. According to the experimental results of Tesla K80, speedups of 24.4x for SWE and 31.5x for Euler could be achieved over a 12-core Intel E5-2697 CPU, which is a great promotion compared with its original speedup (18x and 15.47x) without applying these two methods.
越来越多的科学高性能计算应用被移植到gpu上,以获得高吞吐量和强大的计算能力。许多此类应用,如大气建模和水力侵蚀模拟,都采用有限体积法(FVM)作为求解算法。然而,这些应用程序内部的通信组件通常会导致较低的flop-to-byte比率和GPU资源的低效利用。本文旨在对基于结构化网格的FVM求解器进行优化。除了概述有限体积方法及其在现代GPU平台上的基本优化外,我们还进一步提出了两种通用的调优技术,包括显式缓存机制和内部线程重调度方法,该方法试图在算法特征和平台架构之间实现适当的映射。最后,我们在基于四种主流GPU平台的两种典型大气动态内核(Euler和SWE)上展示了我们的广义优化方法的影响。根据特斯拉K80的实验结果,在12核Intel E5-2697 CPU上,SWE和Euler的加速分别可以达到24.4倍和31.5倍,在不采用这两种方法的情况下,与原来的加速(18倍和15.47倍)相比有了很大的提升。
{"title":"Generalized GPU Acceleration for Applications Employing Finite-Volume Methods","authors":"Jingheng Xu, H. Fu, L. Gan, Chao Yang, Wei Xue, Shizhen Xu, Wenlai Zhao, Xinliang Wang, Bingwei Chen, Guangwen Yang","doi":"10.1109/CCGrid.2016.30","DOIUrl":"https://doi.org/10.1109/CCGrid.2016.30","url":null,"abstract":"Scientific HPC applications are increasingly ported to GPUs to benefit from both the high throughput and the powerful computing capacity. Many of these applications, such as atmospheric modeling and hydraulic erosion simulation, are adopting the finite volume method (FVM) as the solver algorithm. However, the communication components inside these applications generally lead to a low flop-to-byte ratio and an inefficient utilization of GPU resources. This paper aims at optimizing FVM solver based on the structured mesh. Besides a high-level overview of the finite-volume method as well as its basic optimizations on modern GPU platforms, we further present two generalized tuning techniques including an explicit cache mechanism as well as an inner-thread rescheduling method that tries to achieve a suitable mapping between the algorithm feature and the platform architecture. To the end, we demonstrate the impact of our generalized optimization methods in two typical atmospheric dynamic kernels (Euler and SWE) based on four mainstream GPU platforms. According to the experimental results of Tesla K80, speedups of 24.4x for SWE and 31.5x for Euler could be achieved over a 12-core Intel E5-2697 CPU, which is a great promotion compared with its original speedup (18x and 15.47x) without applying these two methods.","PeriodicalId":103641,"journal":{"name":"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129653533","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Exploiting Sample Diversity in Distributed Machine Learning Systems 利用分布式机器学习系统中的样本多样性
Zhiqiang Liu, Xuanhua Shi, Hai Jin
With the increase of machine learning scalability, there is a growing need for distributed systems which can execute machine learning algorithms on large clusters. Currently, most distributed machine learning systems are developed based on iterative optimization algorithm and parameter server framework. However, most systems compute on all samples in every iteration and this method consumes too much computing resources since the amount of samples is always too large. In this paper, we study on the sample diversity and find that most samples ontribute little to model updating during most iterations. Based on these findings, we propose a new iterative optimization algorithm to reduce the computation load by reusing the iterative computing results. The experiment demonstrates that, compared to the current methods, the algorithm proposed in this paper can reduce about 23% of the whole computation load without increasing of communications.
随着机器学习可扩展性的提高,人们越来越需要能够在大型集群上执行机器学习算法的分布式系统。目前,大多数分布式机器学习系统都是基于迭代优化算法和参数服务器框架开发的。然而,大多数系统在每次迭代中对所有样本进行计算,由于样本数量总是太大,这种方法消耗的计算资源过多。本文对样本多样性进行了研究,发现在大多数迭代过程中,大多数样本对模型更新的贡献很小。基于这些发现,我们提出了一种新的迭代优化算法,通过重用迭代计算结果来减少计算量。实验表明,与现有方法相比,本文提出的算法在不增加通信量的情况下,可以减少约23%的总计算量。
{"title":"Exploiting Sample Diversity in Distributed Machine Learning Systems","authors":"Zhiqiang Liu, Xuanhua Shi, Hai Jin","doi":"10.1109/CCGrid.2016.75","DOIUrl":"https://doi.org/10.1109/CCGrid.2016.75","url":null,"abstract":"With the increase of machine learning scalability, there is a growing need for distributed systems which can execute machine learning algorithms on large clusters. Currently, most distributed machine learning systems are developed based on iterative optimization algorithm and parameter server framework. However, most systems compute on all samples in every iteration and this method consumes too much computing resources since the amount of samples is always too large. In this paper, we study on the sample diversity and find that most samples ontribute little to model updating during most iterations. Based on these findings, we propose a new iterative optimization algorithm to reduce the computation load by reusing the iterative computing results. The experiment demonstrates that, compared to the current methods, the algorithm proposed in this paper can reduce about 23% of the whole computation load without increasing of communications.","PeriodicalId":103641,"journal":{"name":"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125756900","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Service Level and Performance Aware Dynamic Resource Allocation in Overbooked Data Centers 基于服务水平和性能的超额数据中心动态资源分配
Luis Tomás, Ewnetu Bayuh Lakew, E. Elmroth
Many cloud computing providers use overbooking to increase their low utilization ratios. This however increases the risk of performance degradation due to interference among co-located VMs. To address this problem we present a service level and performance aware controller that: (1) provides performance isolation for high QoS VMs, and (2) reduces the VM interference between low QoS VMs by dynamically mapping virtual cores to physical cores, thus limiting the amount of resources that each VM can access depending on their performance. Our evaluation based on real cloud applications and both stress, synthetic and realistic workloads demonstrates that a more efficient use of the resources is achieved, dynamically allocating the available capacity to the applications that need it more, which in turn lead to a more stable and predictable performance over time.
许多云计算提供商使用超额预订来提高其低利用率。然而,这增加了由于共存vm之间的干扰而导致性能下降的风险。为了解决这个问题,我们提出了一个服务水平和性能感知控制器:(1)为高QoS虚拟机提供性能隔离,(2)通过动态地将虚拟核心映射到物理核心来减少低QoS虚拟机之间的虚拟机干扰,从而限制每个虚拟机可以根据其性能访问的资源数量。我们基于真实云应用程序以及压力、合成和实际工作负载的评估表明,可以更有效地利用资源,将可用容量动态地分配给更需要它的应用程序,从而随着时间的推移获得更稳定和可预测的性能。
{"title":"Service Level and Performance Aware Dynamic Resource Allocation in Overbooked Data Centers","authors":"Luis Tomás, Ewnetu Bayuh Lakew, E. Elmroth","doi":"10.1109/CCGrid.2016.29","DOIUrl":"https://doi.org/10.1109/CCGrid.2016.29","url":null,"abstract":"Many cloud computing providers use overbooking to increase their low utilization ratios. This however increases the risk of performance degradation due to interference among co-located VMs. To address this problem we present a service level and performance aware controller that: (1) provides performance isolation for high QoS VMs, and (2) reduces the VM interference between low QoS VMs by dynamically mapping virtual cores to physical cores, thus limiting the amount of resources that each VM can access depending on their performance. Our evaluation based on real cloud applications and both stress, synthetic and realistic workloads demonstrates that a more efficient use of the resources is achieved, dynamically allocating the available capacity to the applications that need it more, which in turn lead to a more stable and predictable performance over time.","PeriodicalId":103641,"journal":{"name":"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)","volume":"152 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115576025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
KVLight: A Lightweight Key-Value Store for Distributed Access in Cloud KVLight:用于云中分布式访问的轻量级键值存储
Jiaan Zeng, Beth Plale
Key-value stores (KVS) are finding use in Big Data applications as the store offers a flexible data model, scalability in number of distributed nodes, and high availability. In a cloud environment, a distributed KVS is often deployed over the local file system of the nodes in a cluster of virtual machines (VMs). Parallel file system (PFS) offers an alternate approach to disk storage, however a distributed key value store running over a parallel file system can experience overheads due to its unawareness of the PFS. Additionally, distributed KVS requires persistent running services which is not cost effective under the pay-as-you-go model of cloud computing because resources have to be held even under periods of no workload. We propose KVLight, a lightweight KVS that runs over PFS. It is lightweight in the sense that it shifts the responsibility of reliable data storage to the PFS and focuses on performance. Specifically, KVLight is built on an embedded KVS for high performance but uses novel data structures to support concurrent writes, giving capability that embedded KVSs are not currently designed for. Furthermore, it allows on-demand access without running persistent services in front of the file system. Empirical results show that KVLight outperforms Cassandra and Voldemort, two state-of-the-art KVSs, under both synthetic and realistic workloads.
键值存储(Key-value stores, KVS)在大数据应用中越来越受欢迎,因为它提供了灵活的数据模型、分布式节点数量的可扩展性和高可用性。在云环境中,分布式KVS通常部署在虚拟机集群中节点的本地文件系统上。并行文件系统(PFS)提供了一种磁盘存储的替代方法,但是在并行文件系统上运行的分布式键值存储可能会因为不知道PFS而产生开销。此外,分布式KVS需要持续运行服务,这在云计算的即用即付模型下并不符合成本效益,因为即使在没有工作负载的时期也必须保留资源。我们提出KVLight,一个运行在PFS之上的轻量级KVS。它是轻量级的,因为它将可靠数据存储的责任转移给了PFS,并专注于性能。具体来说,KVLight是在嵌入式KVS的基础上构建的,以获得高性能,但它使用新颖的数据结构来支持并发写,从而提供了嵌入式KVS目前没有设计的功能。此外,它允许按需访问,而无需在文件系统前面运行持久服务。实证结果表明,在合成和现实工作负载下,KVLight的性能都优于两种最先进的KVSs Cassandra和Voldemort。
{"title":"KVLight: A Lightweight Key-Value Store for Distributed Access in Cloud","authors":"Jiaan Zeng, Beth Plale","doi":"10.1109/CCGrid.2016.55","DOIUrl":"https://doi.org/10.1109/CCGrid.2016.55","url":null,"abstract":"Key-value stores (KVS) are finding use in Big Data applications as the store offers a flexible data model, scalability in number of distributed nodes, and high availability. In a cloud environment, a distributed KVS is often deployed over the local file system of the nodes in a cluster of virtual machines (VMs). Parallel file system (PFS) offers an alternate approach to disk storage, however a distributed key value store running over a parallel file system can experience overheads due to its unawareness of the PFS. Additionally, distributed KVS requires persistent running services which is not cost effective under the pay-as-you-go model of cloud computing because resources have to be held even under periods of no workload. We propose KVLight, a lightweight KVS that runs over PFS. It is lightweight in the sense that it shifts the responsibility of reliable data storage to the PFS and focuses on performance. Specifically, KVLight is built on an embedded KVS for high performance but uses novel data structures to support concurrent writes, giving capability that embedded KVSs are not currently designed for. Furthermore, it allows on-demand access without running persistent services in front of the file system. Empirical results show that KVLight outperforms Cassandra and Voldemort, two state-of-the-art KVSs, under both synthetic and realistic workloads.","PeriodicalId":103641,"journal":{"name":"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)","volume":"292 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117328932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Exploring Scalability in Pattern Finding in Galactic Structure Using MapReduce 利用MapReduce探索星系结构模式发现的可扩展性
A. Vulpe, M. Frîncu
Astrophysical applications are known to be data and computationally intensive with large amounts of images being generated by telescopes on a daily basis. To analyze these images data mining, statistical, and image processing techniques are applied on the raw data. Big data platforms such as MapReduce are ideal candidates for processing and storing astrophysical data due to their ability to process loosely coupled parallel tasks. These platforms are usually deployed in clouds, however, most astrophysical applications are legacy applications that are not optimized for cloud computing. While some work towards exploiting the benefits of Hadoop to store astrophysical data and to process the large datasets exists, not much research has been done to assess the scalability of cloud enabled astrophysical applications. In this work we analyze the data and resource scalability of MapReduce applications for astrophysical problems related to cluster detection and inter cluster spatial pattern search. The maximum level of parallelism is bounded by the number of clusters and the number of (cluster, subcluster) pairs in the pattern search. We perform scale-up tests on Google Compute Engine and Amazon EC2. We show that while data scalability is achieved, resource scalability (scale up) is bounded and moreover seems to depend on the underlying cloud platform. For future work we also plan to investigate the scale out on tens of instances with large input files of several GB.
众所周知,天体物理学的应用需要大量的数据和计算,望远镜每天都会产生大量的图像。为了分析这些图像,在原始数据上应用了数据挖掘、统计和图像处理技术。像MapReduce这样的大数据平台是处理和存储天体物理数据的理想选择,因为它们能够处理松散耦合的并行任务。这些平台通常部署在云中,然而,大多数天体物理学应用程序都是遗留应用程序,没有针对云计算进行优化。虽然有些人在利用Hadoop的优势来存储天体物理数据和处理现有的大型数据集方面做了一些工作,但对云支持的天体物理应用程序的可扩展性进行评估的研究并不多。在这项工作中,我们分析了MapReduce应用程序在与集群检测和集群间空间模式搜索相关的天体物理问题中的数据和资源可扩展性。最大并行度受模式搜索中的集群数量和(集群、子集群)对数量的限制。我们在谷歌计算引擎和Amazon EC2上执行缩放测试。我们表明,虽然实现了数据可伸缩性,但资源可伸缩性(向上扩展)是有限的,而且似乎依赖于底层云平台。对于未来的工作,我们还计划在数十个具有几GB大输入文件的实例上研究扩展。
{"title":"Exploring Scalability in Pattern Finding in Galactic Structure Using MapReduce","authors":"A. Vulpe, M. Frîncu","doi":"10.1109/CCGrid.2016.46","DOIUrl":"https://doi.org/10.1109/CCGrid.2016.46","url":null,"abstract":"Astrophysical applications are known to be data and computationally intensive with large amounts of images being generated by telescopes on a daily basis. To analyze these images data mining, statistical, and image processing techniques are applied on the raw data. Big data platforms such as MapReduce are ideal candidates for processing and storing astrophysical data due to their ability to process loosely coupled parallel tasks. These platforms are usually deployed in clouds, however, most astrophysical applications are legacy applications that are not optimized for cloud computing. While some work towards exploiting the benefits of Hadoop to store astrophysical data and to process the large datasets exists, not much research has been done to assess the scalability of cloud enabled astrophysical applications. In this work we analyze the data and resource scalability of MapReduce applications for astrophysical problems related to cluster detection and inter cluster spatial pattern search. The maximum level of parallelism is bounded by the number of clusters and the number of (cluster, subcluster) pairs in the pattern search. We perform scale-up tests on Google Compute Engine and Amazon EC2. We show that while data scalability is achieved, resource scalability (scale up) is bounded and moreover seems to depend on the underlying cloud platform. For future work we also plan to investigate the scale out on tens of instances with large input files of several GB.","PeriodicalId":103641,"journal":{"name":"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)","volume":"224 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116399796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Cost-Efficient Elastic Stream Processing Using Application-Agnostic Performance Prediction 使用与应用程序无关的性能预测的经济高效的弹性流处理
Shigeru Imai, S. Patterson, Carlos A. Varela
Cloud computing adds great on-demand scalability to stream processing systems with its pay-per-use cost model. However, to promise service level agreements to users while keeping resource allocation cost low is a challenging task due to uncertainties coming from various sources, such as the target application's scalability, future computational demand, and the target cloud infrastructure's performance variability. To deal with these uncertainties, it is essential to create accurate application performance prediction models. In cloud computing, the current state of the art in performance modelling remains application-specific. We propose an application-agnostic performance modeling that is applicable to a wide range of applications. We also propose an extension to probabilistic performance prediction. This paper reports the progress we have made so far.
云计算通过其按使用付费的成本模型为流处理系统增加了强大的按需可伸缩性。然而,在向用户承诺服务水平协议的同时保持较低的资源分配成本是一项具有挑战性的任务,因为各种来源的不确定性,例如目标应用程序的可伸缩性、未来的计算需求和目标云基础设施的性能可变性。为了处理这些不确定性,必须创建准确的应用程序性能预测模型。在云计算中,性能建模的当前状态仍然是特定于应用程序的。我们提出了一个应用程序无关的性能建模,适用于广泛的应用程序。我们还提出了对概率性能预测的扩展。本文报告了我们迄今取得的进展。
{"title":"Cost-Efficient Elastic Stream Processing Using Application-Agnostic Performance Prediction","authors":"Shigeru Imai, S. Patterson, Carlos A. Varela","doi":"10.1109/CCGrid.2016.89","DOIUrl":"https://doi.org/10.1109/CCGrid.2016.89","url":null,"abstract":"Cloud computing adds great on-demand scalability to stream processing systems with its pay-per-use cost model. However, to promise service level agreements to users while keeping resource allocation cost low is a challenging task due to uncertainties coming from various sources, such as the target application's scalability, future computational demand, and the target cloud infrastructure's performance variability. To deal with these uncertainties, it is essential to create accurate application performance prediction models. In cloud computing, the current state of the art in performance modelling remains application-specific. We propose an application-agnostic performance modeling that is applicable to a wide range of applications. We also propose an extension to probabilistic performance prediction. This paper reports the progress we have made so far.","PeriodicalId":103641,"journal":{"name":"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)","volume":"66 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117085112","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
期刊
2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1