首页 > 最新文献

2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing最新文献

英文 中文
Coordinated Resource Management for Large Scale Interactive Data Query Systems 面向大规模交互式数据查询系统的协同资源管理
Pub Date : 2015-05-04 DOI: 10.1109/CCGrid.2015.149
Wei Yan, Yuan Xue
Interactive ad hoc data query over massive datasets has recently gained significant traction. Massively parallel data query and analysis frameworks (e.g., Dremel, Impala) are built and deployed to support SQL-like queries over distributed and partitioned data in a clustering environment. As a result, the execution of each query is converted into a set of coordinated tasks including data retrieval, intermediate result computation and transfer, and result aggregation. To support high request rate of concurrent interactive queries, coordinated management of multiple resources (e.g., bandwidth, CPU, memory) of the cluster environment is critical. In this paper, we investigate this resource management problem using an utility-based optimization framework. Our goal is to optimize the resource utilization, and maintain fairness among different types of queries. We present a price-based algorithm which achieves this optimization objective. We implement our algorithm in the open source Impala system and conduct a set of experiments in a clustering environment using the TPC-DS workload. Experimental results show that our coordinated resource management solution can increase the aggregate utility by at least 15.4% compared with simple fair resource share mechanism, and 63.5% compared with the FIFO resource management mechanism.
对海量数据集的交互式临时数据查询最近获得了显著的关注。大规模并行数据查询和分析框架(例如,Dremel, Impala)被构建和部署,以支持在集群环境中对分布式和分区数据进行类似sql的查询。因此,每个查询的执行被转换为一组协调的任务,包括数据检索、中间结果计算和传输以及结果聚合。为了支持并发交互查询的高请求率,集群环境的多种资源(例如带宽、CPU、内存)的协调管理至关重要。在本文中,我们使用基于效用的优化框架来研究这个资源管理问题。我们的目标是优化资源利用率,并在不同类型的查询之间保持公平性。我们提出了一种基于价格的算法来实现这一优化目标。我们在开源的Impala系统中实现了我们的算法,并使用TPC-DS工作负载在集群环境中进行了一组实验。实验结果表明,与简单的公平资源共享机制相比,我们的协调资源管理方案可使总效用至少提高15.4%,与先进先出资源管理机制相比,可使总效用提高63.5%。
{"title":"Coordinated Resource Management for Large Scale Interactive Data Query Systems","authors":"Wei Yan, Yuan Xue","doi":"10.1109/CCGrid.2015.149","DOIUrl":"https://doi.org/10.1109/CCGrid.2015.149","url":null,"abstract":"Interactive ad hoc data query over massive datasets has recently gained significant traction. Massively parallel data query and analysis frameworks (e.g., Dremel, Impala) are built and deployed to support SQL-like queries over distributed and partitioned data in a clustering environment. As a result, the execution of each query is converted into a set of coordinated tasks including data retrieval, intermediate result computation and transfer, and result aggregation. To support high request rate of concurrent interactive queries, coordinated management of multiple resources (e.g., bandwidth, CPU, memory) of the cluster environment is critical. In this paper, we investigate this resource management problem using an utility-based optimization framework. Our goal is to optimize the resource utilization, and maintain fairness among different types of queries. We present a price-based algorithm which achieves this optimization objective. We implement our algorithm in the open source Impala system and conduct a set of experiments in a clustering environment using the TPC-DS workload. Experimental results show that our coordinated resource management solution can increase the aggregate utility by at least 15.4% compared with simple fair resource share mechanism, and 63.5% compared with the FIFO resource management mechanism.","PeriodicalId":6664,"journal":{"name":"2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","volume":"39 7","pages":"677-686"},"PeriodicalIF":0.0,"publicationDate":"2015-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91438961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Service Clustering for Autonomic Clouds Using Random Forest 基于随机森林的自主云服务聚类
Rafael Brundo Uriarte, S. Tsaftaris, F. Tiezzi
Managing and optimising cloud services is one of the main challenges faced by industry and academia. A possible solution is resorting to self-management, as fostered by autonomic computing. However, the abstraction layer provided by cloud computing obfuscates several details of the provided services, which, in turn, hinders the effectiveness of autonomic managers. Data-driven approaches, particularly those relying on service clustering based on machine learning techniques, can assist the autonomic management and support decisions concerning, for example, the scheduling and deployment of services. One aspect that complicates this approach is that the information provided by the monitoring contains both continuous (e.g. CPU load) and categorical (e.g. VM instance type) data. Current approaches treat this problem in a heuristic fashion. This paper, instead, proposes an approach, which uses all kinds of data and learns in a data-driven fashion the similarities and resource usage patterns among the services. In particular, we use an unsupervised formulation of the Random Forest algorithm to calculate similarities and provide them as input to a clustering algorithm. For the sake of efficiency and meeting the dynamism requirement of autonomic clouds, our methodology consists of two steps: (i) off-line clustering and (ii) on-line prediction. Using datasets from real-world clouds, we demonstrate the superiority of our solution with respect to others and validate the accuracy of the on-line prediction. Moreover, to show the applicability of our approach, we devise a service scheduler that uses the notion of similarity among services and evaluate it in a cloud test-bed.
管理和优化云服务是工业界和学术界面临的主要挑战之一。一个可能的解决方案是诉诸于自主计算所促进的自我管理。然而,云计算提供的抽象层混淆了所提供服务的几个细节,这反过来又阻碍了自治管理器的有效性。数据驱动的方法,特别是那些依赖于基于机器学习技术的服务集群的方法,可以帮助自主管理和支持决策,例如,服务的调度和部署。使这种方法复杂化的一个方面是,监视提供的信息既包含连续数据(例如CPU负载),也包含分类数据(例如VM实例类型)。目前的方法以启发式的方式处理这个问题。相反,本文提出了一种方法,该方法使用各种数据,并以数据驱动的方式学习服务之间的相似性和资源使用模式。特别是,我们使用随机森林算法的无监督公式来计算相似度,并将其作为聚类算法的输入。为了提高效率和满足自主云的动态要求,我们的方法包括两个步骤:(i)离线聚类和(ii)在线预测。使用来自真实世界云的数据集,我们证明了我们的解决方案相对于其他解决方案的优越性,并验证了在线预测的准确性。此外,为了展示我们方法的适用性,我们设计了一个服务调度器,它使用服务之间的相似性概念,并在云测试平台中对其进行评估。
{"title":"Service Clustering for Autonomic Clouds Using Random Forest","authors":"Rafael Brundo Uriarte, S. Tsaftaris, F. Tiezzi","doi":"10.1109/CCGrid.2015.41","DOIUrl":"https://doi.org/10.1109/CCGrid.2015.41","url":null,"abstract":"Managing and optimising cloud services is one of the main challenges faced by industry and academia. A possible solution is resorting to self-management, as fostered by autonomic computing. However, the abstraction layer provided by cloud computing obfuscates several details of the provided services, which, in turn, hinders the effectiveness of autonomic managers. Data-driven approaches, particularly those relying on service clustering based on machine learning techniques, can assist the autonomic management and support decisions concerning, for example, the scheduling and deployment of services. One aspect that complicates this approach is that the information provided by the monitoring contains both continuous (e.g. CPU load) and categorical (e.g. VM instance type) data. Current approaches treat this problem in a heuristic fashion. This paper, instead, proposes an approach, which uses all kinds of data and learns in a data-driven fashion the similarities and resource usage patterns among the services. In particular, we use an unsupervised formulation of the Random Forest algorithm to calculate similarities and provide them as input to a clustering algorithm. For the sake of efficiency and meeting the dynamism requirement of autonomic clouds, our methodology consists of two steps: (i) off-line clustering and (ii) on-line prediction. Using datasets from real-world clouds, we demonstrate the superiority of our solution with respect to others and validate the accuracy of the on-line prediction. Moreover, to show the applicability of our approach, we devise a service scheduler that uses the notion of similarity among services and evaluate it in a cloud test-bed.","PeriodicalId":6664,"journal":{"name":"2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","volume":"13 1","pages":"515-524"},"PeriodicalIF":0.0,"publicationDate":"2015-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86945214","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
Characterizing MPI and Hybrid MPI+Threads Applications at Scale: Case Study with BFS 大规模表征MPI和混合MPI+线程应用:BFS案例研究
A. Amer, Huiwei Lu, P. Balaji, S. Matsuoka
With the increasing prominence of many-core architectures and decreasing per-core resources on large supercomputers, a number of applications developers are investigating the use of hybrid MPI+threads programming to utilize computational units while sharing memory. An MPI-only model that uses one MPI process per system core is capable of effectively utilizing the processing units, but it fails to fully utilize the memory hierarchy and relies on fine-grained internodes communication. Hybrid MPI+threads models, on the other hand, can handle internodes parallelism more effectively and alleviate some of the overheads associated with internodes communication by allowing more coarse-grained data movement between address spaces. The hybrid model, however, can suffer from locking and memory consistency overheads associated with data sharing. In this paper, we use a distributed implementation of the breadth-first search algorithm in order to understand the performance characteristics of MPI-only and MPI+threads models at scale. We start with a baseline MPI-only implementation and propose MPI+threads extensions where threads independently communicate with remote processes while cooperating for local computation. We demonstrate how the coarse-grained communication of MPI+threads considerably reduces time and space overheads that grow with the number of processes. At large scale, however, these overheads constitute performance barriers for both models and require fixing the root causes, such as the excessive polling for communication progress and inefficient global synchronizations. To this end, we demonstrate various techniques to reduce such overheads and show performance improvements on up to 512K cores of a Blue Gene/Q system.
随着多核架构的日益突出和大型超级计算机上每核资源的减少,许多应用程序开发人员正在研究使用混合MPI+线程编程来利用计算单元,同时共享内存。每个系统核心使用一个MPI进程的仅MPI模型能够有效地利用处理单元,但它不能充分利用内存层次结构,并且依赖于细粒度的节点间通信。另一方面,混合MPI+线程模型可以更有效地处理节点间并行性,并通过允许更粗粒度的数据在地址空间之间移动来减轻与节点间通信相关的一些开销。然而,混合模型可能会受到与数据共享相关的锁定和内存一致性开销的影响。在本文中,我们使用广度优先搜索算法的分布式实现,以便大规模地了解MPI-only和MPI+线程模型的性能特征。我们从一个仅MPI的基线实现开始,并提出MPI+线程扩展,其中线程在协作进行本地计算的同时独立地与远程进程通信。我们将演示MPI+线程的粗粒度通信如何显著减少随着进程数量增加而增加的时间和空间开销。然而,在大规模的情况下,这些开销构成了两种模型的性能障碍,并且需要解决根本原因,例如通信进程的过度轮询和低效的全局同步。为此,我们演示了各种技术来减少此类开销,并在Blue Gene/Q系统的高达512K内核上展示了性能改进。
{"title":"Characterizing MPI and Hybrid MPI+Threads Applications at Scale: Case Study with BFS","authors":"A. Amer, Huiwei Lu, P. Balaji, S. Matsuoka","doi":"10.1109/CCGrid.2015.93","DOIUrl":"https://doi.org/10.1109/CCGrid.2015.93","url":null,"abstract":"With the increasing prominence of many-core architectures and decreasing per-core resources on large supercomputers, a number of applications developers are investigating the use of hybrid MPI+threads programming to utilize computational units while sharing memory. An MPI-only model that uses one MPI process per system core is capable of effectively utilizing the processing units, but it fails to fully utilize the memory hierarchy and relies on fine-grained internodes communication. Hybrid MPI+threads models, on the other hand, can handle internodes parallelism more effectively and alleviate some of the overheads associated with internodes communication by allowing more coarse-grained data movement between address spaces. The hybrid model, however, can suffer from locking and memory consistency overheads associated with data sharing. In this paper, we use a distributed implementation of the breadth-first search algorithm in order to understand the performance characteristics of MPI-only and MPI+threads models at scale. We start with a baseline MPI-only implementation and propose MPI+threads extensions where threads independently communicate with remote processes while cooperating for local computation. We demonstrate how the coarse-grained communication of MPI+threads considerably reduces time and space overheads that grow with the number of processes. At large scale, however, these overheads constitute performance barriers for both models and require fixing the root causes, such as the excessive polling for communication progress and inefficient global synchronizations. To this end, we demonstrate various techniques to reduce such overheads and show performance improvements on up to 512K cores of a Blue Gene/Q system.","PeriodicalId":6664,"journal":{"name":"2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","volume":"7 1","pages":"1075-1083"},"PeriodicalIF":0.0,"publicationDate":"2015-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87120005","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
A Framework to Accelerate Protein Structure Comparison Tools 加速蛋白质结构比较工具的框架
Pub Date : 2015-05-04 DOI: 10.1109/CCGrid.2015.136
Ahmad Salah, Kenli Li, Tarek F. Gharib
At the center of computational structural biology, protein structure comparison is a key problem. The steady increase in the number of protein structures encourages the development of massively parallel tools. While the focus of research is to propose data-analytical methods to tackle this problem, there are limited research proposing generic tools to run these methods in parallel environments. Herein, we propose a scalable framework to handle this steady increase. The proposed framework runs the sequential tools on parallel environments. It is a GUI-based and requiring no scripting or installation procedures. The framework includes optimally distributing protein structure database over the existing computing resources, tracking the remote processes course of execution, and merging the results to form the final output. The first stage realizes the biological database distribution as an optimization problem in order to maximize the cluster resources utilization and minimize the execution time. The experimental results show linear and nearly optimal speedups with no loss in accuracy. The framework is available at http://biocloud.hnu.edu.cn/ppsc/.
在计算结构生物学的中心,蛋白质结构比较是一个关键问题。蛋白质结构数量的稳定增长鼓励了大规模并行工具的发展。虽然研究的重点是提出数据分析方法来解决这个问题,但提出在并行环境中运行这些方法的通用工具的研究有限。在这里,我们提出了一个可扩展的框架来处理这种稳定的增长。提出的框架在并行环境中运行顺序工具。它是基于gui的,不需要脚本或安装过程。该框架包括在现有计算资源上优化分布蛋白质结构数据库,跟踪远程进程的执行过程,并合并结果形成最终输出。第一阶段将生物数据库的分布作为一个优化问题来实现,以最大化集群资源利用率和最小化执行时间。实验结果表明,在精度没有损失的情况下,加速是线性的,几乎是最优的。该框架可从http://biocloud.hnu.edu.cn/ppsc/获得。
{"title":"A Framework to Accelerate Protein Structure Comparison Tools","authors":"Ahmad Salah, Kenli Li, Tarek F. Gharib","doi":"10.1109/CCGrid.2015.136","DOIUrl":"https://doi.org/10.1109/CCGrid.2015.136","url":null,"abstract":"At the center of computational structural biology, protein structure comparison is a key problem. The steady increase in the number of protein structures encourages the development of massively parallel tools. While the focus of research is to propose data-analytical methods to tackle this problem, there are limited research proposing generic tools to run these methods in parallel environments. Herein, we propose a scalable framework to handle this steady increase. The proposed framework runs the sequential tools on parallel environments. It is a GUI-based and requiring no scripting or installation procedures. The framework includes optimally distributing protein structure database over the existing computing resources, tracking the remote processes course of execution, and merging the results to form the final output. The first stage realizes the biological database distribution as an optimization problem in order to maximize the cluster resources utilization and minimize the execution time. The experimental results show linear and nearly optimal speedups with no loss in accuracy. The framework is available at http://biocloud.hnu.edu.cn/ppsc/.","PeriodicalId":6664,"journal":{"name":"2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","volume":"8 1","pages":"705-708"},"PeriodicalIF":0.0,"publicationDate":"2015-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87359793","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Implementation and Evaluation of MPI Nonblocking Collective I/O MPI非阻塞集体I/O的实现与评价
Sangmin Seo, R. Latham, Junchao Zhang, P. Balaji
The well-known gap between relative CPU speeds and storage bandwidth results in the need for new strategies for managing I/O demands. In large-scale MPI applications, collective I/O has long been an effective way to achieve higher I/O rates, but it poses two constraints. First, although overlapping collective I/O and computation represents the next logical step toward a faster time to solution, MPI's existing collective I/O API provides only limited support for doing so. Second, collective routines (both for I/O and communication) impose a synchronization cost in addition to a communication cost. The upcoming MPI 3.1 standard will provide a new set of nonblocking collective I/O operations to satisfy the need of applications. We present here initial work on the implementation of MPI nonblocking collective I/O operations in the MPICH MPI library. Our implementation begins with the extended two-phase algorithm used in ROMIO's collective I/O implementation. We then utilize a state machine and the extended generalized request interface to maintain the progress of nonblocking collective I/O operations. The evaluation results indicate that our implementation performs as well as blocking collective I/O in terms of I/O bandwidth and is capable of overlapping I/O and other operations. We believe that our implementation can help users try nonblocking collective I/O operations in their applications.
众所周知,相对CPU速度和存储带宽之间的差距导致需要新的策略来管理I/O需求。在大规模MPI应用中,集体I/O一直是实现更高I/O速率的有效方法,但它存在两个限制。首先,尽管重叠的集合I/O和计算是加快解决方案的下一个合乎逻辑的步骤,但MPI现有的集合I/O API只提供了有限的支持。其次,集体例程(用于I/O和通信)除了通信成本外,还增加了同步成本。即将发布的MPI 3.1标准将提供一组新的非阻塞集体I/O操作,以满足应用程序的需求。我们在这里介绍在MPICH MPI库中实现MPI非阻塞集体I/O操作的初步工作。我们的实现从ROMIO集合I/O实现中使用的扩展的两阶段算法开始。然后,我们利用状态机和扩展的通用请求接口来维护非阻塞集体I/O操作的进度。评估结果表明,我们的实现在阻塞I/O带宽方面表现良好,并且能够重叠I/O和其他操作。我们相信我们的实现可以帮助用户在他们的应用程序中尝试非阻塞的集体I/O操作。
{"title":"Implementation and Evaluation of MPI Nonblocking Collective I/O","authors":"Sangmin Seo, R. Latham, Junchao Zhang, P. Balaji","doi":"10.1109/CCGrid.2015.81","DOIUrl":"https://doi.org/10.1109/CCGrid.2015.81","url":null,"abstract":"The well-known gap between relative CPU speeds and storage bandwidth results in the need for new strategies for managing I/O demands. In large-scale MPI applications, collective I/O has long been an effective way to achieve higher I/O rates, but it poses two constraints. First, although overlapping collective I/O and computation represents the next logical step toward a faster time to solution, MPI's existing collective I/O API provides only limited support for doing so. Second, collective routines (both for I/O and communication) impose a synchronization cost in addition to a communication cost. The upcoming MPI 3.1 standard will provide a new set of nonblocking collective I/O operations to satisfy the need of applications. We present here initial work on the implementation of MPI nonblocking collective I/O operations in the MPICH MPI library. Our implementation begins with the extended two-phase algorithm used in ROMIO's collective I/O implementation. We then utilize a state machine and the extended generalized request interface to maintain the progress of nonblocking collective I/O operations. The evaluation results indicate that our implementation performs as well as blocking collective I/O in terms of I/O bandwidth and is capable of overlapping I/O and other operations. We believe that our implementation can help users try nonblocking collective I/O operations in their applications.","PeriodicalId":6664,"journal":{"name":"2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","volume":"33 1","pages":"1084-1091"},"PeriodicalIF":0.0,"publicationDate":"2015-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90546554","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Power-Check: An Energy-Efficient Checkpointing Framework for HPC Clusters 功率检查:高性能计算集群的节能检查点框架
Pub Date : 2015-05-04 DOI: 10.1109/CCGrid.2015.169
R. Rajachandrasekar, Akshay Venkatesh, Khaled Hamidouche, D. Panda
Checkpoint-restart is a predominantly used reactive fault-tolerance mechanism for applications running on HPC systems. While there are innumerable studies in literature that have analyzed, and optimized for, the performance and scalability of a variety of check pointing protocols, not much research has been done from an energy or power perspective. The limited number of studies conducted along this line have primarily analyzed and modeled power and energy usage during check pointing phases. Applications running on future exascale machines will be constrained by a power envelope, and it is not only important to understand the behavior of check pointing systems under such an envelope but to also adopt techniques that can leverage power capping capabilities exposed by the OS to achieve energy savings without forsaking performance. In this paper, we address the problem of marginal energy benefits with significant performance degradation due to naive application of power capping around check pointing phases by proposing a novel power-aware check pointing framework -- Power-Check. By use of data funnelling mechanisms and selective core power-capping, Power-Check makes efficient use of the I/O and CPU subsystem. Evaluations with application kernels show that Power-Check can yield as much as 48% reduction in the amount of energy consumed during a checkpoint, while improving the check pointing performance by 14%.
检查点重新启动是一种主要用于运行在HPC系统上的应用程序的反应性容错机制。虽然文献中有无数的研究分析和优化了各种检查点协议的性能和可伸缩性,但从能源或电力的角度进行的研究并不多。沿着这条路线进行的有限数量的研究主要分析和模拟了检查点阶段的电力和能源使用情况。在未来的百亿亿级机器上运行的应用程序将受到功率包络的限制,不仅要了解在这种包络下检查点系统的行为,而且要采用可以利用操作系统暴露的功率上限功能的技术,在不牺牲性能的情况下实现节能,这一点很重要。在本文中,我们通过提出一种新颖的功率感知检查指向框架- power- check,解决了由于在检查指向阶段周围天真地应用功率封顶而导致的显著性能下降的边际能源效益问题。通过使用数据漏斗机制和选择性核心功率封顶,Power-Check可以有效地利用I/O和CPU子系统。对应用程序内核的评估表明,Power-Check可以在检查点期间减少多达48%的能耗,同时将检查点性能提高14%。
{"title":"Power-Check: An Energy-Efficient Checkpointing Framework for HPC Clusters","authors":"R. Rajachandrasekar, Akshay Venkatesh, Khaled Hamidouche, D. Panda","doi":"10.1109/CCGrid.2015.169","DOIUrl":"https://doi.org/10.1109/CCGrid.2015.169","url":null,"abstract":"Checkpoint-restart is a predominantly used reactive fault-tolerance mechanism for applications running on HPC systems. While there are innumerable studies in literature that have analyzed, and optimized for, the performance and scalability of a variety of check pointing protocols, not much research has been done from an energy or power perspective. The limited number of studies conducted along this line have primarily analyzed and modeled power and energy usage during check pointing phases. Applications running on future exascale machines will be constrained by a power envelope, and it is not only important to understand the behavior of check pointing systems under such an envelope but to also adopt techniques that can leverage power capping capabilities exposed by the OS to achieve energy savings without forsaking performance. In this paper, we address the problem of marginal energy benefits with significant performance degradation due to naive application of power capping around check pointing phases by proposing a novel power-aware check pointing framework -- Power-Check. By use of data funnelling mechanisms and selective core power-capping, Power-Check makes efficient use of the I/O and CPU subsystem. Evaluations with application kernels show that Power-Check can yield as much as 48% reduction in the amount of energy consumed during a checkpoint, while improving the check pointing performance by 14%.","PeriodicalId":6664,"journal":{"name":"2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","volume":"256 1","pages":"261-270"},"PeriodicalIF":0.0,"publicationDate":"2015-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73125767","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
Confuga: Scalable Data Intensive Computing for POSIX Workflows Confuga: POSIX工作流的可扩展数据密集型计算
P. Donnelly, Nicholas L. Hazekamp, D. Thain
Today's big-data analysis systems achieve performance and scalability by requiring end users to embrace a novel programming model. This approach is highly effective whose the objective is to compute relatively simple functions on colossal amounts of data, but it is not a good match for a scientific computing environment which depends on complex applications written for the conventional POSIX environment. To address this gap, we introduce Conjugal, a scalable data-intensive computing system that is largely compatible with the POSIX environment. Conjugal brings together the workflow model of scientific computing with the storage architecture of other big data systems. Conjugal accepts large workflows of standard POSIX applications arranged into graphs, and then executes them in a cluster, exploiting both parallelism and data-locality. By making use of the workload structure, Conjugal is able to avoid the long-standing problems of metadata scalability and load instability found in many large scale computing and storage systems. We show that CompUSA's approach to load control offers improvements of up to 228% in cluster network utilization and 23% reductions in workflow execution time.
当今的大数据分析系统通过要求最终用户采用新颖的编程模型来实现性能和可扩展性。这种方法非常有效,其目标是在大量数据上计算相对简单的函数,但它不适合科学计算环境,因为科学计算环境依赖于为传统POSIX环境编写的复杂应用程序。为了解决这个问题,我们引入了Conjugal,这是一个可扩展的数据密集型计算系统,与POSIX环境基本兼容。Conjugal将科学计算的工作流模型与其他大数据系统的存储架构结合在一起。Conjugal接受排列成图形的标准POSIX应用程序的大型工作流,然后在集群中执行它们,利用并行性和数据局部性。通过使用这种工作负载结构,Conjugal可以避免许多大型计算和存储系统中长期存在的元数据可扩展性和负载不稳定问题。我们表明,CompUSA的负载控制方法在集群网络利用率方面提供了高达228%的改进,在工作流执行时间方面减少了23%。
{"title":"Confuga: Scalable Data Intensive Computing for POSIX Workflows","authors":"P. Donnelly, Nicholas L. Hazekamp, D. Thain","doi":"10.1109/CCGrid.2015.95","DOIUrl":"https://doi.org/10.1109/CCGrid.2015.95","url":null,"abstract":"Today's big-data analysis systems achieve performance and scalability by requiring end users to embrace a novel programming model. This approach is highly effective whose the objective is to compute relatively simple functions on colossal amounts of data, but it is not a good match for a scientific computing environment which depends on complex applications written for the conventional POSIX environment. To address this gap, we introduce Conjugal, a scalable data-intensive computing system that is largely compatible with the POSIX environment. Conjugal brings together the workflow model of scientific computing with the storage architecture of other big data systems. Conjugal accepts large workflows of standard POSIX applications arranged into graphs, and then executes them in a cluster, exploiting both parallelism and data-locality. By making use of the workload structure, Conjugal is able to avoid the long-standing problems of metadata scalability and load instability found in many large scale computing and storage systems. We show that CompUSA's approach to load control offers improvements of up to 228% in cluster network utilization and 23% reductions in workflow execution time.","PeriodicalId":6664,"journal":{"name":"2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","volume":"15 1","pages":"392-401"},"PeriodicalIF":0.0,"publicationDate":"2015-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75366089","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Astrophysics Simulation on RSC Massively Parallel Architecture 基于RSC大规模并行体系结构的天体物理仿真
Pub Date : 2015-05-04 DOI: 10.1109/CCGrid.2015.102
I. Kulikov, I. Chernykh, B. Glinsky, D. Weins, A. Shmelev
AstroPhi code is designed for simulation of astrophysical objects dynamics on hybrid supercomputers equipped with Intel Xenon Phi computation accelerators. New RSC PetaStream massively parallel architecture used for simulation. The results of AstroPhi acceleration for Intel Xeon Phi native and offload execution modes are presented in this paper. RSC PetaStream architecture gives possibility of astrophysical problems simulation in high resolution. AGNES simulation tool was used for scalability simulation of AstroPhi code. The are some gravitational collapse problems presented as demonstration of AstroPhi code.
AstroPhi代码是为在配备英特尔Xenon Phi计算加速器的混合超级计算机上模拟天体物理对象动力学而设计的。新的RSC PetaStream大规模并行架构用于仿真。本文给出了AstroPhi处理器在Intel Xeon Phi处理器本地和卸载执行模式下的加速结果。RSC PetaStream架构提供了高分辨率天体物理问题模拟的可能性。利用AGNES仿真工具对AstroPhi代码进行可扩展性仿真。给出了一些重力坍缩问题作为AstroPhi代码的演示。
{"title":"Astrophysics Simulation on RSC Massively Parallel Architecture","authors":"I. Kulikov, I. Chernykh, B. Glinsky, D. Weins, A. Shmelev","doi":"10.1109/CCGrid.2015.102","DOIUrl":"https://doi.org/10.1109/CCGrid.2015.102","url":null,"abstract":"AstroPhi code is designed for simulation of astrophysical objects dynamics on hybrid supercomputers equipped with Intel Xenon Phi computation accelerators. New RSC PetaStream massively parallel architecture used for simulation. The results of AstroPhi acceleration for Intel Xeon Phi native and offload execution modes are presented in this paper. RSC PetaStream architecture gives possibility of astrophysical problems simulation in high resolution. AGNES simulation tool was used for scalability simulation of AstroPhi code. The are some gravitational collapse problems presented as demonstration of AstroPhi code.","PeriodicalId":6664,"journal":{"name":"2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","volume":"25 1","pages":"1131-1134"},"PeriodicalIF":0.0,"publicationDate":"2015-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77476869","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Scheduling Workloads of Workflows with Unknown Task Runtimes 具有未知任务运行时的工作流的调度工作负载
A. Ilyushkin, Bogdan Ghit, D. Epema
Workflows are important computational tools in many branches of science, and because of the dependencies among their tasks and their widely different characteristics, scheduling them is a difficult problem. Most research on scheduling workflows has focused on the offline problem of minimizing the make span of single workflows with known task runtimes. The problem of scheduling multiple workflows has been addressed either in an offline fashion, or still with the assumption of known task runtimes. In this paper, we study the problem of scheduling workloads consisting of an arrival stream of workflows without task runtime estimates. The resource requirements of a workflow can significantly fluctuate during its execution. Thus, we present four scheduling policies for workloads of workflows with as their main feature the extent to which they reserve processors to workflows to deal with these fluctuations. We perform simulations with realistic synthetic workloads and we show that any form of processor reservation only decreases the overall system performance and that a greedy backfilling-like policy performs best.
工作流是许多科学分支中重要的计算工具,由于其任务之间的依赖性和它们广泛不同的特征,对它们进行调度是一个难题。大多数关于工作流调度的研究都集中在最小化具有已知任务运行时的单个工作流的生成跨度的离线问题上。调度多个工作流的问题已经以离线方式解决,或者仍然假设已知任务运行时。在本文中,我们研究了由工作流到达流组成的无任务运行时估计的工作负载调度问题。工作流的资源需求在执行过程中会有很大的波动。因此,我们为工作流的工作负载提出了四种调度策略,其主要特征是它们为工作流保留处理器以处理这些波动的程度。我们对真实的合成工作负载进行了模拟,结果表明,任何形式的处理器预留只会降低系统的整体性能,而类似贪婪回填的策略表现最好。
{"title":"Scheduling Workloads of Workflows with Unknown Task Runtimes","authors":"A. Ilyushkin, Bogdan Ghit, D. Epema","doi":"10.1109/CCGrid.2015.27","DOIUrl":"https://doi.org/10.1109/CCGrid.2015.27","url":null,"abstract":"Workflows are important computational tools in many branches of science, and because of the dependencies among their tasks and their widely different characteristics, scheduling them is a difficult problem. Most research on scheduling workflows has focused on the offline problem of minimizing the make span of single workflows with known task runtimes. The problem of scheduling multiple workflows has been addressed either in an offline fashion, or still with the assumption of known task runtimes. In this paper, we study the problem of scheduling workloads consisting of an arrival stream of workflows without task runtime estimates. The resource requirements of a workflow can significantly fluctuate during its execution. Thus, we present four scheduling policies for workloads of workflows with as their main feature the extent to which they reserve processors to workflows to deal with these fluctuations. We perform simulations with realistic synthetic workloads and we show that any form of processor reservation only decreases the overall system performance and that a greedy backfilling-like policy performs best.","PeriodicalId":6664,"journal":{"name":"2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","volume":"114 1","pages":"606-616"},"PeriodicalIF":0.0,"publicationDate":"2015-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79884728","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
Techniques for Enabling Highly Efficient Message Passing on Many-Core Architectures 在多核架构上实现高效消息传递的技术
Min Si, P. Balaji, Y. Ishikawa
Many-core architecture provides a massively parallel environment with dozens of cores and hundreds of hardware threads. Scientific application programmers are increasingly looking at ways to utilize such large numbers of lightweight cores for various programming models. Efficiently executing these models on massively parallel many-core environments is not easy, however and performance may be degraded in various ways. The first author's doctoral research focuses on exploiting the capabilities of many-core architectures on widely used MPI implementations. While application programmers have studied several approaches to achieve better parallelism and resource sharing, many of those approaches still face communication problems that degrade performance. In the thesis, we investigate the characteristics of MPI on such massively threaded architectures and propose two efficient strategies -- a multi-threaded MPI approach and a process-based asynchronous model -- to optimize MPI communication for modern scientific applications.
多核体系结构提供了具有数十个核和数百个硬件线程的大规模并行环境。科学应用程序程序员正在越来越多地寻找方法,以便为各种编程模型利用如此大量的轻量级核心。然而,在大规模并行多核环境中有效地执行这些模型并不容易,而且性能可能会以各种方式降低。第一作者的博士研究重点是在广泛使用的MPI实现上开发多核架构的功能。虽然应用程序程序员已经研究了几种方法来实现更好的并行性和资源共享,但其中许多方法仍然面临降低性能的通信问题。在本文中,我们研究了MPI在这种大规模线程架构上的特点,并提出了两种有效的策略——多线程MPI方法和基于进程的异步模型——来优化现代科学应用的MPI通信。
{"title":"Techniques for Enabling Highly Efficient Message Passing on Many-Core Architectures","authors":"Min Si, P. Balaji, Y. Ishikawa","doi":"10.1109/CCGrid.2015.68","DOIUrl":"https://doi.org/10.1109/CCGrid.2015.68","url":null,"abstract":"Many-core architecture provides a massively parallel environment with dozens of cores and hundreds of hardware threads. Scientific application programmers are increasingly looking at ways to utilize such large numbers of lightweight cores for various programming models. Efficiently executing these models on massively parallel many-core environments is not easy, however and performance may be degraded in various ways. The first author's doctoral research focuses on exploiting the capabilities of many-core architectures on widely used MPI implementations. While application programmers have studied several approaches to achieve better parallelism and resource sharing, many of those approaches still face communication problems that degrade performance. In the thesis, we investigate the characteristics of MPI on such massively threaded architectures and propose two efficient strategies -- a multi-threaded MPI approach and a process-based asynchronous model -- to optimize MPI communication for modern scientific applications.","PeriodicalId":6664,"journal":{"name":"2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","volume":"43 1","pages":"697-700"},"PeriodicalIF":0.0,"publicationDate":"2015-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86551749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1