首页 > 最新文献

2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing最新文献

英文 中文
An Adaptive Data Prefetcher for High-Performance Processors 高性能处理器的自适应数据预取器
Yong Chen, Huaiyu Zhu, Xian-He Sun
While computing speed continues increasing rapidly, data-access technology is lagging behind. Data-access delay, not the processor speed, becomes the leading performance bottleneck of high-end/high-performance computing. Prefetching is an effective solution to masking the gap between computing speed and data-access speed. Existing works of prefetching, however, are very conservative in general, due to the computing power consumption concern of the past. They suffer in effectiveness especially when applications' access pattern changes. In this study, we propose an Algorithm-level Feedback-controlled Adaptive (AFA) data prefetcher to address these issues. The AFA prefetcher is based on the Data-Access History Cache, a hardware structure that is specifically designed for data prefetching. It provides an algorithm-level adaptation and is capable of dynamically adapting to appropriate prefetching algorithms at runtime. We have conducted extensive simulation testing with Simple Scalar simulator to validate the design and to illustrate the performance gain. The simulation results show that AFA prefetcher is effective and achieves considerable IPC (Instructions Per Cycle) improvement in average.
在计算速度持续快速增长的同时,数据访问技术却相对滞后。数据访问延迟,而不是处理器速度,成为高端/高性能计算的主要性能瓶颈。预取是掩盖计算速度和数据访问速度差距的有效解决方案。然而,由于过去对计算功耗的考虑,现有的预取工作总体上是非常保守的。它们的有效性会受到影响,尤其是当应用程序的访问模式发生变化时。在本研究中,我们提出一种算法级反馈控制自适应(AFA)数据预取器来解决这些问题。AFA预取器基于数据访问历史缓存,这是一种专门为数据预取而设计的硬件结构。它提供了算法级的自适应,能够在运行时动态适应适当的预取算法。我们使用Simple Scalar模拟器进行了大量的模拟测试,以验证设计并说明性能增益。仿真结果表明,AFA预取器是有效的,平均每周期指令数(IPC)有相当大的提高。
{"title":"An Adaptive Data Prefetcher for High-Performance Processors","authors":"Yong Chen, Huaiyu Zhu, Xian-He Sun","doi":"10.1109/CCGRID.2010.61","DOIUrl":"https://doi.org/10.1109/CCGRID.2010.61","url":null,"abstract":"While computing speed continues increasing rapidly, data-access technology is lagging behind. Data-access delay, not the processor speed, becomes the leading performance bottleneck of high-end/high-performance computing. Prefetching is an effective solution to masking the gap between computing speed and data-access speed. Existing works of prefetching, however, are very conservative in general, due to the computing power consumption concern of the past. They suffer in effectiveness especially when applications' access pattern changes. In this study, we propose an Algorithm-level Feedback-controlled Adaptive (AFA) data prefetcher to address these issues. The AFA prefetcher is based on the Data-Access History Cache, a hardware structure that is specifically designed for data prefetching. It provides an algorithm-level adaptation and is capable of dynamically adapting to appropriate prefetching algorithms at runtime. We have conducted extensive simulation testing with Simple Scalar simulator to validate the design and to illustrate the performance gain. The simulation results show that AFA prefetcher is effective and achieves considerable IPC (Instructions Per Cycle) improvement in average.","PeriodicalId":444485,"journal":{"name":"2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing","volume":"47 8","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120851160","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
A Realistic Integrated Model of Parallel System Workloads 并行系统负载的现实集成模型
T. Minh, L. Wolters, D. Epema
Performance evaluation is a significant step in the study of scheduling algorithms in large-scale parallel systems ranging from supercomputers to clusters and grids. One of the key factors that have a strong effect on the evaluation results is the workloads (or traces) used in experiments. In practice, several researchers use unrealistic synthetic workloads in their scheduling evaluations because they lack models that can help generate realistic synthetic workloads. In this paper we propose a full model to capture the following characteristics of real parallel system workloads: 1) long range dependence in the job arrival process, 2) temporal and spatial burstiness, 3) bag-oftasks behaviour, and 4) correlation between the runtime and the number of processors. Validation of our model with real traces shows that our model not only captures the above characteristics but also fits the marginal distributions well. In addition, we also present an approach to quantify burstiness in a job arrival process (temporal) as well as burstiness in the load of a trace (spatial).
在从超级计算机到集群和网格的大规模并行系统中,性能评估是研究调度算法的重要步骤。对评估结果有很大影响的关键因素之一是实验中使用的工作量(或轨迹)。实际上,一些研究人员在他们的调度评估中使用了不现实的合成工作负载,因为他们缺乏可以帮助生成实际合成工作负载的模型。在本文中,我们提出了一个完整的模型来捕捉真实并行系统工作负载的以下特征:1)作业到达过程中的长距离依赖性,2)时间和空间突发性,3)任务袋行为,以及4)运行时和处理器数量之间的相关性。用真实迹线对模型进行了验证,结果表明,该模型不仅能很好地捕捉上述特征,而且能很好地拟合边际分布。此外,我们还提出了一种方法来量化工作到达过程中的突发性(时间)以及轨迹负载中的突发性(空间)。
{"title":"A Realistic Integrated Model of Parallel System Workloads","authors":"T. Minh, L. Wolters, D. Epema","doi":"10.1109/CCGRID.2010.32","DOIUrl":"https://doi.org/10.1109/CCGRID.2010.32","url":null,"abstract":"Performance evaluation is a significant step in the study of scheduling algorithms in large-scale parallel systems ranging from supercomputers to clusters and grids. One of the key factors that have a strong effect on the evaluation results is the workloads (or traces) used in experiments. In practice, several researchers use unrealistic synthetic workloads in their scheduling evaluations because they lack models that can help generate realistic synthetic workloads. In this paper we propose a full model to capture the following characteristics of real parallel system workloads: 1) long range dependence in the job arrival process, 2) temporal and spatial burstiness, 3) bag-oftasks behaviour, and 4) correlation between the runtime and the number of processors. Validation of our model with real traces shows that our model not only captures the above characteristics but also fits the marginal distributions well. In addition, we also present an approach to quantify burstiness in a job arrival process (temporal) as well as burstiness in the load of a trace (spatial).","PeriodicalId":444485,"journal":{"name":"2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing","volume":"318 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116258425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 33
Designing Accelerator-Based Distributed Systems for High Performance 基于加速器的高性能分布式系统设计
M. M. Rafique, A. Butt, Dimitrios S. Nikolopoulos
Multi-core processors with accelerators are becoming commodity components for high-performance computing at scale. While accelerator-based processors have been studied in some detail, the design and management of clusters based on these processors have not received the same focus. In this paper, we present an exploration of four design and resource management alternatives, which can be used on large-scale asymmetric clusters with accelerators. Moreover, we adapt the popular MapReduce programming model to our proposed configurations. We enhance MapReduce with new dynamic data streaming and workload scheduling capabilities, which enable application writers to use asymmetric accelerator-based clusters without being concerned with the capabilities of individual components. We present an evaluation of the presented designs in a physical setting and show that our designs can provide significant performance advantages. Compared to a standard static MapReduce design, we achieve 62.5%, 73.1%, and 82.2% performance improvement using accelerators with limited general-purpose resources, well-provisioned shared general-purpose resources, and well-provisioned dedicated general-purpose resources, respectively.
带加速器的多核处理器正在成为大规模高性能计算的商品组件。虽然已经对基于加速器的处理器进行了一些详细的研究,但基于这些处理器的集群的设计和管理还没有得到同样的关注。在本文中,我们提出了四种设计和资源管理方案的探索,这些方案可用于具有加速器的大规模非对称集群。此外,我们将流行的MapReduce编程模型调整为我们建议的配置。我们通过新的动态数据流和工作负载调度功能增强了MapReduce,这使得应用程序编写者可以使用基于非对称加速器的集群,而不必关心单个组件的功能。我们在物理环境中对所提出的设计进行了评估,并表明我们的设计可以提供显着的性能优势。与标准的静态MapReduce设计相比,我们分别使用具有有限通用资源、配置良好的共享通用资源和配置良好的专用通用资源的加速器实现了62.5%、73.1%和82.2%的性能改进。
{"title":"Designing Accelerator-Based Distributed Systems for High Performance","authors":"M. M. Rafique, A. Butt, Dimitrios S. Nikolopoulos","doi":"10.1109/CCGRID.2010.109","DOIUrl":"https://doi.org/10.1109/CCGRID.2010.109","url":null,"abstract":"Multi-core processors with accelerators are becoming commodity components for high-performance computing at scale. While accelerator-based processors have been studied in some detail, the design and management of clusters based on these processors have not received the same focus. In this paper, we present an exploration of four design and resource management alternatives, which can be used on large-scale asymmetric clusters with accelerators. Moreover, we adapt the popular MapReduce programming model to our proposed configurations. We enhance MapReduce with new dynamic data streaming and workload scheduling capabilities, which enable application writers to use asymmetric accelerator-based clusters without being concerned with the capabilities of individual components. We present an evaluation of the presented designs in a physical setting and show that our designs can provide significant performance advantages. Compared to a standard static MapReduce design, we achieve 62.5%, 73.1%, and 82.2% performance improvement using accelerators with limited general-purpose resources, well-provisioned shared general-purpose resources, and well-provisioned dedicated general-purpose resources, respectively.","PeriodicalId":444485,"journal":{"name":"2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124500986","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
A Heuristic Query Optimization Approach for Heterogeneous Environments 异构环境下的启发式查询优化方法
P. Beran, W. Mach, R. Vigne, Juergen Mangler, E. Schikuta
In a rapidly growing digital world the ability to discover, query and access data efficiently is one of the major challenges we are struggling today. Google has done a tremendous job by enabling casual users to easily and efficiently search for Web documents of interest. However, a comparable mechanism to query data stocks located in distributed databases is not available yet. Therefore our research focuses on the query optimization of distributed database queries, considering a huge variety on different infrastructures and algorithms. This paper introduces a novel heuristic query optimization approach based on a multi-layered blackboard mechanism. Moreover, a short evaluation scenario proofs our investigations that even small changes in the structure of a query execution tree (QET) can lead to significant performance improvements.
在快速发展的数字世界中,有效地发现、查询和访问数据的能力是我们今天面临的主要挑战之一。Google做了大量工作,使普通用户能够轻松有效地搜索感兴趣的Web文档。但是,目前还没有一种可比较的机制来查询位于分布式数据库中的数据存储。因此,我们的研究重点是分布式数据库查询的查询优化,考虑到不同基础设施和算法的巨大差异。介绍了一种基于多层黑板机制的启发式查询优化方法。此外,一个简短的评估场景证明了我们的调查,即使是查询执行树(QET)结构的微小变化也会导致显著的性能改进。
{"title":"A Heuristic Query Optimization Approach for Heterogeneous Environments","authors":"P. Beran, W. Mach, R. Vigne, Juergen Mangler, E. Schikuta","doi":"10.1109/CCGRID.2010.65","DOIUrl":"https://doi.org/10.1109/CCGRID.2010.65","url":null,"abstract":"In a rapidly growing digital world the ability to discover, query and access data efficiently is one of the major challenges we are struggling today. Google has done a tremendous job by enabling casual users to easily and efficiently search for Web documents of interest. However, a comparable mechanism to query data stocks located in distributed databases is not available yet. Therefore our research focuses on the query optimization of distributed database queries, considering a huge variety on different infrastructures and algorithms. This paper introduces a novel heuristic query optimization approach based on a multi-layered blackboard mechanism. Moreover, a short evaluation scenario proofs our investigations that even small changes in the structure of a query execution tree (QET) can lead to significant performance improvements.","PeriodicalId":444485,"journal":{"name":"2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126399303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Integration of Heterogeneous and Non-dedicated Environments for R 集成异构和非专用的R环境
Gonzalo Vera, R. Suppi
Parallel computing is becoming essential for nowadays data analysis in several disciplines. In order to profit from parallel processing of experimental data, specialized skills, software tools and suitable computing resources are required. Desktop grids and volunteer-based systems have proved themselves as powerful options where distributed idle resources from heterogeneous computers are aggregated to build powerful met computers. Software solutions are required to automate and assist the process of transformation and adaptation of current and new applications to run in these environments. Finally, it is desirable, for the same tool, to provide an efficient solution to orchestrate the execution of these programs using a diversity of dynamic environments. In this paper we describe an implementation of an integrated solution for the R language which allows the transformation and execution of parallel loops in heterogeneous and non-dedicated environments. The results obtained allow us to prove the feasibility of our proposal. Furthermore, several issues that tools like this must consider to improve their performance when integrating heterogeneous systems are described.
并行计算在当今多个学科的数据分析中变得越来越重要。为了从实验数据的并行处理中获利,需要专门的技能、软件工具和合适的计算资源。桌面网格和基于志愿者的系统已被证明是一种强大的选择,在这些系统中,来自异构计算机的分布式空闲资源被聚合起来,以构建强大的虚拟计算机。需要软件解决方案来自动化和协助当前和新应用程序在这些环境中运行的转换和适应过程。最后,对于相同的工具,需要提供一个有效的解决方案,以便使用多种动态环境来编排这些程序的执行。在本文中,我们描述了R语言的集成解决方案的实现,该解决方案允许在异构和非专用环境中转换和执行并行循环。得到的结果使我们能够证明我们的建议是可行的。此外,本文还描述了此类工具在集成异构系统时必须考虑的几个问题,以提高它们的性能。
{"title":"Integration of Heterogeneous and Non-dedicated Environments for R","authors":"Gonzalo Vera, R. Suppi","doi":"10.1109/CCGRID.2010.102","DOIUrl":"https://doi.org/10.1109/CCGRID.2010.102","url":null,"abstract":"Parallel computing is becoming essential for nowadays data analysis in several disciplines. In order to profit from parallel processing of experimental data, specialized skills, software tools and suitable computing resources are required. Desktop grids and volunteer-based systems have proved themselves as powerful options where distributed idle resources from heterogeneous computers are aggregated to build powerful met computers. Software solutions are required to automate and assist the process of transformation and adaptation of current and new applications to run in these environments. Finally, it is desirable, for the same tool, to provide an efficient solution to orchestrate the execution of these programs using a diversity of dynamic environments. In this paper we describe an implementation of an integrated solution for the R language which allows the transformation and execution of parallel loops in heterogeneous and non-dedicated environments. The results obtained allow us to prove the feasibility of our proposal. Furthermore, several issues that tools like this must consider to improve their performance when integrating heterogeneous systems are described.","PeriodicalId":444485,"journal":{"name":"2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing","volume":"119 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133423920","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ConnectX-2 InfiniBand Management Queues: First Investigation of the New Support for Network Offloaded Collective Operations ConnectX-2 InfiniBand管理队列:网络卸载集体操作新支持初探
R. Graham, Steve Poole, Pavel Shamis, Gil Bloch, N. Bloch, H. Chapman, Michael Kagan, Ariel Shahar, Ishai Rabinovitz, G. Shainer
This paper introduces the newly developed Infini- Band (IB) Management Queue capability, used by the Host Channel Adapter (HCA) to manage network task data flow dependancies, and progress the communications associated with such flows. These tasks include sends, receives, and the newly supported wait task, and are scheduled by the HCA based on a data dependency description provided by the user. This functionality is supported by the ConnectX-2 HCA, and provides the means for delegating collective communication management and progress to the HCA, also known as collective communication offload. This provides a means for overlapping collective communications managed by the HCA and computation on the Central Processing Unit (CPU), thus making it possible to reduce the impact of system noise on parallel applications using collective operations. This paper further describes how this new capability can be used to implement scalable Message Passing Interface (MPI) collective operations, describing the high level details of how this new capability is used to implement the MPI Barrier collective operation, focusing on the latency sensitive performance aspects of this new capability. This paper concludes with small scale bench- mark experiments comparing implementations of the barrier collective operation, using the new network offload capabilities, with established point-to-point based implementations of these same algorithms, which manage the data flow using the central processing unit. These early results demonstrate the promise this new capability provides to improve the scalability of high- performance applications using collective communications. The latency of the HCA based implementation of the barrier is similar to that of the best performing point-to-point based implementation managed by the central processing unit, starting to outperform these as the number of processes involved in the collective operation increases.
本文介绍了新开发的Infini- Band (IB)管理队列功能,该功能被主机通道适配器(HCA)用于管理网络任务数据流的依赖关系,并推进与这些流相关的通信。这些任务包括发送、接收和新支持的等待任务,并由HCA根据用户提供的数据依赖描述进行调度。ConnectX-2 HCA支持此功能,并提供将集体通信管理和进度委托给HCA的方法,也称为集体通信卸载。这提供了一种方法来重叠由HCA管理的集体通信和中央处理单元(CPU)上的计算,从而有可能减少系统噪声对使用集体操作的并行应用程序的影响。本文进一步描述了如何使用这个新功能来实现可伸缩的消息传递接口(MPI)集合操作,描述了如何使用这个新功能来实现MPI Barrier集合操作的高级细节,重点介绍了这个新功能的延迟敏感性能方面。本文最后进行了小规模的基准实验,比较了使用新的网络卸载功能的屏障集体操作的实现与使用中央处理单元管理数据流的基于点对点的相同算法的实现。这些早期的结果表明,这种新功能提供了使用集体通信来提高高性能应用程序的可伸缩性的希望。基于HCA的屏障实现的延迟与由中央处理单元管理的性能最好的基于点对点的实现的延迟相似,随着集体操作中涉及的进程数量的增加,延迟开始超过这些。
{"title":"ConnectX-2 InfiniBand Management Queues: First Investigation of the New Support for Network Offloaded Collective Operations","authors":"R. Graham, Steve Poole, Pavel Shamis, Gil Bloch, N. Bloch, H. Chapman, Michael Kagan, Ariel Shahar, Ishai Rabinovitz, G. Shainer","doi":"10.1109/CCGRID.2010.9","DOIUrl":"https://doi.org/10.1109/CCGRID.2010.9","url":null,"abstract":"This paper introduces the newly developed Infini- Band (IB) Management Queue capability, used by the Host Channel Adapter (HCA) to manage network task data flow dependancies, and progress the communications associated with such flows. These tasks include sends, receives, and the newly supported wait task, and are scheduled by the HCA based on a data dependency description provided by the user. This functionality is supported by the ConnectX-2 HCA, and provides the means for delegating collective communication management and progress to the HCA, also known as collective communication offload. This provides a means for overlapping collective communications managed by the HCA and computation on the Central Processing Unit (CPU), thus making it possible to reduce the impact of system noise on parallel applications using collective operations. This paper further describes how this new capability can be used to implement scalable Message Passing Interface (MPI) collective operations, describing the high level details of how this new capability is used to implement the MPI Barrier collective operation, focusing on the latency sensitive performance aspects of this new capability. This paper concludes with small scale bench- mark experiments comparing implementations of the barrier collective operation, using the new network offload capabilities, with established point-to-point based implementations of these same algorithms, which manage the data flow using the central processing unit. These early results demonstrate the promise this new capability provides to improve the scalability of high- performance applications using collective communications. The latency of the HCA based implementation of the barrier is similar to that of the best performing point-to-point based implementation managed by the central processing unit, starting to outperform these as the number of processes involved in the collective operation increases.","PeriodicalId":444485,"journal":{"name":"2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing","volume":"216 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124269307","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 43
Using Cloud Constructs and Predictive Analysis to Enable Pre-Failure Process Migration in HPC Systems 使用云结构和预测分析在高性能计算系统中实现故障前流程迁移
J. Brandt, Frank Chen, Vincent De Sapio, A. Gentile, J. Mayo, P. Pébay, D. Roe, D. Thompson, M. Wong
Accurate failure prediction in conjunction with efficient process migration facilities including some Cloud constructs can enable failure avoidance in large-scale high performance computing (HPC) platforms. In this work we demonstrate a prototype system that incorporates our probabilistic failure prediction system with virtualization mechanisms and techniques to provide a whole system approach to failure avoidance. This work utilizes a failure scenario based on a real-world HPC case study.
准确的故障预测与高效的流程迁移设施(包括一些云结构)相结合,可以在大规模高性能计算(HPC)平台中实现故障避免。在这项工作中,我们展示了一个原型系统,该系统将我们的概率故障预测系统与虚拟化机制和技术相结合,以提供一个完整的系统方法来避免故障。这项工作利用了一个基于真实HPC案例研究的故障场景。
{"title":"Using Cloud Constructs and Predictive Analysis to Enable Pre-Failure Process Migration in HPC Systems","authors":"J. Brandt, Frank Chen, Vincent De Sapio, A. Gentile, J. Mayo, P. Pébay, D. Roe, D. Thompson, M. Wong","doi":"10.1109/CCGRID.2010.31","DOIUrl":"https://doi.org/10.1109/CCGRID.2010.31","url":null,"abstract":"Accurate failure prediction in conjunction with efficient process migration facilities including some Cloud constructs can enable failure avoidance in large-scale high performance computing (HPC) platforms. In this work we demonstrate a prototype system that incorporates our probabilistic failure prediction system with virtualization mechanisms and techniques to provide a whole system approach to failure avoidance. This work utilizes a failure scenario based on a real-world HPC case study.","PeriodicalId":444485,"journal":{"name":"2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing","volume":"243 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114389015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Methodology for Efficient Execution of SPMD Applications on Multicore Environments 在多核环境中有效执行SPMD应用程序的方法
Ronal Muresano, Dolores Rexachs, E. Luque
The need to efficiently execute applications in heterogeneous environments is a current challenge for parallel computing programmers. The communication heterogeneities found in multicore clusters need to be addressed to improve efficiency and speedup. This work presents a methodology developed for SPMD applications, which is centered on managing communication heterogeneities and improving system efficiency on multicore clusters. The methodology is composed of three phases: characterization, mapping strategy, and scheduling policy. We focus on SPMD applications which are designed through a message-passing library for communication, and selected according to their synchronicity and communications volume. The novel contribution of this methodology is it determines the approximate number of cores necessary to achieve a suitable solution with a good execution time, while the efficiency level is maintained over a threshold defined by users. Applying this methodology gave results showing a maximum improvement in efficiency of around 43% in the SPMD applications tested.
在异构环境中高效地执行应用程序是并行计算程序员当前面临的挑战。为了提高效率和速度,需要解决多核集群中存在的通信异构问题。这项工作提出了一种针对SPMD应用开发的方法,该方法以管理通信异构性和提高多核集群上的系统效率为中心。该方法由三个阶段组成:特征描述、映射策略和调度策略。我们重点研究了通过消息传递库设计用于通信的SPMD应用程序,并根据其同步性和通信量进行选择。这种方法的新颖之处在于,它确定了实现具有良好执行时间的合适解决方案所需的核心的大致数量,同时将效率水平维持在用户定义的阈值之上。应用该方法的结果显示,在测试的SPMD应用程序中,效率的最大提高约为43%。
{"title":"Methodology for Efficient Execution of SPMD Applications on Multicore Environments","authors":"Ronal Muresano, Dolores Rexachs, E. Luque","doi":"10.1109/CCGRID.2010.67","DOIUrl":"https://doi.org/10.1109/CCGRID.2010.67","url":null,"abstract":"The need to efficiently execute applications in heterogeneous environments is a current challenge for parallel computing programmers. The communication heterogeneities found in multicore clusters need to be addressed to improve efficiency and speedup. This work presents a methodology developed for SPMD applications, which is centered on managing communication heterogeneities and improving system efficiency on multicore clusters. The methodology is composed of three phases: characterization, mapping strategy, and scheduling policy. We focus on SPMD applications which are designed through a message-passing library for communication, and selected according to their synchronicity and communications volume. The novel contribution of this methodology is it determines the approximate number of cores necessary to achieve a suitable solution with a good execution time, while the efficiency level is maintained over a threshold defined by users. Applying this methodology gave results showing a maximum improvement in efficiency of around 43% in the SPMD applications tested.","PeriodicalId":444485,"journal":{"name":"2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116872876","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Energy Efficient Resource Management in Virtualized Cloud Data Centers 虚拟化云数据中心的节能资源管理
A. Beloglazov, R. Buyya
Rapid growth of the demand for computational power by scientific, business and web-applications has led to the creation of large-scale data centers consuming enormous amounts of electrical power. We propose an energy efficient resource management system for virtualized Cloud data centers that reduces operational costs and provides required Quality of Service (QoS). Energy savings are achieved by continuous consolidation of VMs according to current utilization of resources, virtual network topologies established between VMs and thermal state of computing nodes. We present first results of simulation-driven evaluation of heuristics for dynamic reallocation of VMs using live migration according to current requirements for CPU performance. The results show that the proposed technique brings substantial energy savings, while ensuring reliable QoS. This justifies further investigation and development of the proposed resource management system.
科学、商业和网络应用对计算能力的需求快速增长,导致了大规模数据中心的建立,消耗了大量的电力。我们为虚拟化云数据中心提出了一种节能的资源管理系统,它可以降低运营成本并提供所需的服务质量(QoS)。根据当前资源利用率、虚拟机之间建立的虚拟网络拓扑、计算节点的热状态,对虚拟机进行持续整合,实现节能。我们提出了根据当前CPU性能要求使用实时迁移动态重新分配虚拟机的启发式模拟驱动评估的第一个结果。结果表明,该方法在保证可靠的服务质量的同时,节省了大量的电能。因此,有理由进一步调查和发展拟议的资源管理制度。
{"title":"Energy Efficient Resource Management in Virtualized Cloud Data Centers","authors":"A. Beloglazov, R. Buyya","doi":"10.1109/CCGRID.2010.46","DOIUrl":"https://doi.org/10.1109/CCGRID.2010.46","url":null,"abstract":"Rapid growth of the demand for computational power by scientific, business and web-applications has led to the creation of large-scale data centers consuming enormous amounts of electrical power. We propose an energy efficient resource management system for virtualized Cloud data centers that reduces operational costs and provides required Quality of Service (QoS). Energy savings are achieved by continuous consolidation of VMs according to current utilization of resources, virtual network topologies established between VMs and thermal state of computing nodes. We present first results of simulation-driven evaluation of heuristics for dynamic reallocation of VMs using live migration according to current requirements for CPU performance. The results show that the proposed technique brings substantial energy savings, while ensuring reliable QoS. This justifies further investigation and development of the proposed resource management system.","PeriodicalId":444485,"journal":{"name":"2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117035348","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 839
Sky Computing: When Multiple Clouds Become One 天空计算:当多云合二为一
J. Fortes
Summary form only given. The growing number of announced commercial and scientific clouds strongly suggests that in the near future these providers will be differentiated according to the types of their services, their cost, availability and quality. Users will be able to use these and other criteria to determine which clouds best suit their needs, a plausible scenario being the case when users need to aggregate capabilities provided by different clouds. In such scenarios it will be essential to provide virtual networking technologies that enable providers to support cross-cloud communication and users to deploy cross-cloud applications. This talk will describe one such technology, its salient features and remaining challenges. It will also put forward the idea of virtual clouds, i.e. providers of computing services overlaid on more than one cloud. A virtual cloud spans across multiple cloud providers and presents the view of a single logical cloud. Virtual clouds would enable high-level computing services to be provided by third parties who do not own physical resources, could be short or long-lived and highly dynamic. Enabling technologies, challenges and examples of sky computing will be presented.
只提供摘要形式。越来越多的商业和科学云表明,在不久的将来,这些提供商将根据其服务类型、成本、可用性和质量进行区分。用户将能够使用这些标准和其他标准来确定哪些云最适合他们的需求,当用户需要聚合不同云提供的功能时,可能会出现这种情况。在这种情况下,必须提供虚拟网络技术,使提供商能够支持跨云通信,使用户能够部署跨云应用程序。本演讲将描述一种这样的技术,它的突出特点和仍然存在的挑战。它还将提出虚拟云的概念,即在多个云上提供计算服务。虚拟云跨越多个云提供商,呈现单个逻辑云的视图。虚拟云将使不拥有物理资源的第三方能够提供高级计算服务,这些服务可以是短期的,也可以是长期的,并且是高度动态的。将介绍天空计算的使能技术、挑战和实例。
{"title":"Sky Computing: When Multiple Clouds Become One","authors":"J. Fortes","doi":"10.1109/CCGRID.2010.136","DOIUrl":"https://doi.org/10.1109/CCGRID.2010.136","url":null,"abstract":"Summary form only given. The growing number of announced commercial and scientific clouds strongly suggests that in the near future these providers will be differentiated according to the types of their services, their cost, availability and quality. Users will be able to use these and other criteria to determine which clouds best suit their needs, a plausible scenario being the case when users need to aggregate capabilities provided by different clouds. In such scenarios it will be essential to provide virtual networking technologies that enable providers to support cross-cloud communication and users to deploy cross-cloud applications. This talk will describe one such technology, its salient features and remaining challenges. It will also put forward the idea of virtual clouds, i.e. providers of computing services overlaid on more than one cloud. A virtual cloud spans across multiple cloud providers and presents the view of a single logical cloud. Virtual clouds would enable high-level computing services to be provided by third parties who do not own physical resources, could be short or long-lived and highly dynamic. Enabling technologies, challenges and examples of sky computing will be presented.","PeriodicalId":444485,"journal":{"name":"2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117301182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
期刊
2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1