首页 > 最新文献

2015 IEEE International Parallel and Distributed Processing Symposium Workshop最新文献

英文 中文
The Active classroom: Students and Instructors Parallel Programming in Parallel 活跃的课堂:学生和教师并行编程
Pub Date : 2015-05-25 DOI: 10.1109/IPDPSW.2015.24
Nasser Giacaman, Simar Kalra, O. Sinnen
The biggest difficulty that students face when learning programming is in developing the necessary cognitive skills that allows them to apply what they have learnt. It is generally accepted that programming is one of those things that can only be learnt by doing and actively engaging with it. Parallel programming is a prime example of a programming area that students commonly struggle with. A major inhibitor is due to some of its abstract concepts, making it difficult to grasp a true understanding of the underlying principles in a traditional classroom setting. This paper discusses the underlying principles that motivated the development of Active Classroom Programmer (ACP), a tool for students to learn effective programming strategies with the guidance of their instructor. ACP aims to increase students skills in applying programming topics, by immediately engaging them with the newly introduced material. This is especially important in parallel programming, as the topics quickly progress onto the many parallelisation caveats (such as thread-safety, race conditions, and so on). While laboratory or homework exercises provide students with valuable hands-on experience (to apply newly taught concepts), this opportunity generally arrives too late after the material is presented in the lesson. To address this, a collection of parallel programming exercises are being developed for the NSF/IEEE-TCPP Curriculum Initiative on Parallel and Distributed Computing (as an Early Adopter award), with the help of ACP. Instructors are welcome to utilise any of the developed exercises, or even request a private ACP account for their own courses to program with their students.
学生在学习编程时面临的最大困难是发展必要的认知技能,使他们能够应用所学的知识。人们普遍认为编程是一种只能通过实践和积极参与来学习的东西。并行编程是学生们经常遇到的编程领域的一个主要例子。一个主要的阻碍是由于它的一些抽象概念,使得在传统的课堂环境中很难掌握对潜在原则的真正理解。本文讨论了激发主动课堂程序员(ACP)开发的基本原则,ACP是学生在教师指导下学习有效编程策略的工具。ACP旨在提高学生应用编程主题的技能,让他们立即接触到新引入的材料。这在并行编程中尤其重要,因为主题很快就会发展到许多并行化警告(如线程安全、竞争条件等)。虽然实验室或家庭作业为学生提供了宝贵的实践经验(以应用新教授的概念),但这种机会通常在课堂上呈现材料后才出现。为了解决这个问题,在ACP的帮助下,正在为NSF/IEEE-TCPP并行和分布式计算课程倡议(作为早期采用者奖)开发一系列并行编程练习。欢迎教师使用任何开发的练习,甚至要求私人ACP帐户为他们自己的课程与学生一起编程。
{"title":"The Active classroom: Students and Instructors Parallel Programming in Parallel","authors":"Nasser Giacaman, Simar Kalra, O. Sinnen","doi":"10.1109/IPDPSW.2015.24","DOIUrl":"https://doi.org/10.1109/IPDPSW.2015.24","url":null,"abstract":"The biggest difficulty that students face when learning programming is in developing the necessary cognitive skills that allows them to apply what they have learnt. It is generally accepted that programming is one of those things that can only be learnt by doing and actively engaging with it. Parallel programming is a prime example of a programming area that students commonly struggle with. A major inhibitor is due to some of its abstract concepts, making it difficult to grasp a true understanding of the underlying principles in a traditional classroom setting. This paper discusses the underlying principles that motivated the development of Active Classroom Programmer (ACP), a tool for students to learn effective programming strategies with the guidance of their instructor. ACP aims to increase students skills in applying programming topics, by immediately engaging them with the newly introduced material. This is especially important in parallel programming, as the topics quickly progress onto the many parallelisation caveats (such as thread-safety, race conditions, and so on). While laboratory or homework exercises provide students with valuable hands-on experience (to apply newly taught concepts), this opportunity generally arrives too late after the material is presented in the lesson. To address this, a collection of parallel programming exercises are being developed for the NSF/IEEE-TCPP Curriculum Initiative on Parallel and Distributed Computing (as an Early Adopter award), with the help of ACP. Instructors are welcome to utilise any of the developed exercises, or even request a private ACP account for their own courses to program with their students.","PeriodicalId":340697,"journal":{"name":"2015 IEEE International Parallel and Distributed Processing Symposium Workshop","volume":"83 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114831031","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
PCO Keynote PCO主题
Pub Date : 2015-05-25 DOI: 10.1109/IPDPSW.2015.178
A. Pothen
Computing a matching in a graph is one of "the hardest simple problems" in discrete mathematics and computer science. It is simple since most variants of matching can be solved in polynomial time, yet hard because the running times are high and the algorithms are complex. It is even more challenging to design parallel algorithms for matching, since many algorithms rely on searching for long paths in a graph, or implicitly communicate information along long paths, and thus have little concurrency. However, in the last fifteen years there has been much work in developing parallel matching algorithms via approximation: we do not find optimal matchings, but look for matchings that are guaranteed to be within a constant factor of being optimal. There has been a flurry of activity in designing and implementing such algorithms, and now we have efficient algorithms for computing matchings on multicore shared memory computers. This talk will survey this body of work in matching algorithms.
计算图中的匹配是离散数学和计算机科学中“最难的简单问题”之一。它很简单,因为大多数匹配变量都可以在多项式时间内解决,但它很困难,因为运行时间长,算法复杂。设计用于匹配的并行算法更具挑战性,因为许多算法依赖于搜索图中的长路径,或者沿着长路径隐式地传递信息,因此几乎没有并发性。然而,在过去的15年里,通过近似开发并行匹配算法已经做了很多工作:我们没有找到最优匹配,而是寻找保证在一个恒定的最优因子范围内的匹配。在设计和实现这样的算法方面已经有了大量的活动,现在我们有了在多核共享内存计算机上计算匹配的有效算法。这个演讲将会调查匹配算法的工作。
{"title":"PCO Keynote","authors":"A. Pothen","doi":"10.1109/IPDPSW.2015.178","DOIUrl":"https://doi.org/10.1109/IPDPSW.2015.178","url":null,"abstract":"Computing a matching in a graph is one of \"the hardest simple problems\" in discrete mathematics and computer science. It is simple since most variants of matching can be solved in polynomial time, yet hard because the running times are high and the algorithms are complex. It is even more challenging to design parallel algorithms for matching, since many algorithms rely on searching for long paths in a graph, or implicitly communicate information along long paths, and thus have little concurrency. However, in the last fifteen years there has been much work in developing parallel matching algorithms via approximation: we do not find optimal matchings, but look for matchings that are guaranteed to be within a constant factor of being optimal. There has been a flurry of activity in designing and implementing such algorithms, and now we have efficient algorithms for computing matchings on multicore shared memory computers. This talk will survey this body of work in matching algorithms.","PeriodicalId":340697,"journal":{"name":"2015 IEEE International Parallel and Distributed Processing Symposium Workshop","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126352892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Considerations on Distributed Load Balancing for Fully Heterogeneous Machines: Two Particular Cases 对完全异构机器的分布式负载平衡的考虑:两种特殊情况
Pub Date : 2015-05-25 DOI: 10.1109/IPDPSW.2015.36
Nathanaël Cheriere, Erik Saule
When the size of parallel systems increases, centralized algorithms to schedule tasks on the system can induce a significant overhead. This is why decentralized scheduling algorithms have been developed. The most popular one certainly is work-stealing because of its interesting theoretical guarantees. Parallel systems have evolved from homogeneous clusters to fully heterogeneous ones such as GPU-accelerated clusters. We investigate in this paper decentralized scheduling algorithms for heterogeneous systems. The guarantees of work-stealing algorithms no longer hold on such systems because it is an a posteriori algorithm which highly depends on the initial distribution of work. We focus on a priori decentralized scheduling algorithms for heterogeneous systems and we propose two distributed algorithms to balance the load on unrelated machines for two particular cases. The first one exploits a low heterogeneity in the task set and reaches an approximation ratio linear in the number of types of tasks. The second one focuses on the case where the system only uses two different types of machines and we show it is a 2-approximation if the system converges. In the case it does not converge, we study the dynamic equilibrium of the system. In the homogeneous case, we numerically compute the probability density function of the load imbalance and show that the imbalance is low on average. And we show using simulation that the heterogeneous case is similar to the homogeneous case and that the imbalance is low in both cases.
当并行系统的规模增加时,在系统上调度任务的集中式算法可能会导致显著的开销。这就是分散调度算法被开发出来的原因。最流行的一种当然是偷作品,因为它有有趣的理论保证。并行系统已经从同构集群发展到完全异构集群,比如gpu加速集群。本文研究了异构系统的分散调度算法。工作窃取算法的保证不再适用于这样的系统,因为它是一种高度依赖于工作初始分配的后验算法。本文重点研究了异构系统的先验分散调度算法,并针对两种特殊情况提出了两种分布式算法来平衡不相关机器上的负载。第一种方法利用任务集中的低异质性,在任务类型的数量上达到近似线性的比率。第二个集中在系统只使用两种不同类型机器的情况下我们证明如果系统收敛,它是2逼近。在不收敛的情况下,研究系统的动态平衡。在齐次情况下,我们数值计算了负载不平衡的概率密度函数,并表明负载不平衡的平均不平衡程度很低。我们通过模拟表明,异质情况与均匀情况相似,两种情况下的不平衡都很低。
{"title":"Considerations on Distributed Load Balancing for Fully Heterogeneous Machines: Two Particular Cases","authors":"Nathanaël Cheriere, Erik Saule","doi":"10.1109/IPDPSW.2015.36","DOIUrl":"https://doi.org/10.1109/IPDPSW.2015.36","url":null,"abstract":"When the size of parallel systems increases, centralized algorithms to schedule tasks on the system can induce a significant overhead. This is why decentralized scheduling algorithms have been developed. The most popular one certainly is work-stealing because of its interesting theoretical guarantees. Parallel systems have evolved from homogeneous clusters to fully heterogeneous ones such as GPU-accelerated clusters. We investigate in this paper decentralized scheduling algorithms for heterogeneous systems. The guarantees of work-stealing algorithms no longer hold on such systems because it is an a posteriori algorithm which highly depends on the initial distribution of work. We focus on a priori decentralized scheduling algorithms for heterogeneous systems and we propose two distributed algorithms to balance the load on unrelated machines for two particular cases. The first one exploits a low heterogeneity in the task set and reaches an approximation ratio linear in the number of types of tasks. The second one focuses on the case where the system only uses two different types of machines and we show it is a 2-approximation if the system converges. In the case it does not converge, we study the dynamic equilibrium of the system. In the homogeneous case, we numerically compute the probability density function of the load imbalance and show that the imbalance is low on average. And we show using simulation that the heterogeneous case is similar to the homogeneous case and that the imbalance is low in both cases.","PeriodicalId":340697,"journal":{"name":"2015 IEEE International Parallel and Distributed Processing Symposium Workshop","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133901331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Trapezoid Quorum Protocol Dedicated to Erasure Resilient Coding Based Schemes 专用于Erasure弹性编码方案的梯形仲裁协议
Pub Date : 2015-05-25 DOI: 10.1109/IPDPSW.2015.108
T. J. R. Relaza, J. Jorda, A. Mzoughi
In distributed storage systems like parallel file systems or storage virtualization middleware, data replication is the mainly used solution to implement data avaialability. The more replicas are distributed among nodes, the more robust is the storage system. However, the price to pay for this dependability becomes significant, due to both direct costs (the price of disks) and indirect costs (the energy consumption of this large amount of disks needed). In order to lower the disk space needed for a given availalbility, Erasure Resilient Codes (referred to as ERC after this) are of interest and start to be implemented in this context. However, the use of such codes involves some new problems in data management. In fact, if some constraints like data concurrency can be solved using classical ways, others like coherency protocols need some adaptations in order to fit this context. In this paper, we present an adaptation of trapezoid protocol in the context of ERC schemes (instead of full replication). This new quorum protocol shows an increase of storage space efficiency while maintaining a high level of availability for read and writes operations.
在并行文件系统或存储虚拟化中间件等分布式存储系统中,数据复制是实现数据可用性的主要解决方案。在节点间分布的副本越多,存储系统就越健壮。然而,由于直接成本(磁盘的价格)和间接成本(所需大量磁盘的能源消耗),为这种可靠性付出的代价变得非常大。为了降低给定可用性所需的磁盘空间,Erasure Resilient Codes(在此之后称为ERC)引起了人们的兴趣,并开始在此上下文中实现。但是,这些代码的使用涉及到数据管理方面的一些新问题。事实上,如果某些约束(如数据并发性)可以使用经典方法解决,那么其他约束(如一致性协议)则需要进行一些调整以适应这种上下文。在本文中,我们提出了一个梯形协议在ERC方案背景下的适应性(而不是完全复制)。这种新的仲裁协议显示了存储空间效率的提高,同时保持了读写操作的高可用性。
{"title":"Trapezoid Quorum Protocol Dedicated to Erasure Resilient Coding Based Schemes","authors":"T. J. R. Relaza, J. Jorda, A. Mzoughi","doi":"10.1109/IPDPSW.2015.108","DOIUrl":"https://doi.org/10.1109/IPDPSW.2015.108","url":null,"abstract":"In distributed storage systems like parallel file systems or storage virtualization middleware, data replication is the mainly used solution to implement data avaialability. The more replicas are distributed among nodes, the more robust is the storage system. However, the price to pay for this dependability becomes significant, due to both direct costs (the price of disks) and indirect costs (the energy consumption of this large amount of disks needed). In order to lower the disk space needed for a given availalbility, Erasure Resilient Codes (referred to as ERC after this) are of interest and start to be implemented in this context. However, the use of such codes involves some new problems in data management. In fact, if some constraints like data concurrency can be solved using classical ways, others like coherency protocols need some adaptations in order to fit this context. In this paper, we present an adaptation of trapezoid protocol in the context of ERC schemes (instead of full replication). This new quorum protocol shows an increase of storage space efficiency while maintaining a high level of availability for read and writes operations.","PeriodicalId":340697,"journal":{"name":"2015 IEEE International Parallel and Distributed Processing Symposium Workshop","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133499258","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Declarative Patterns for Imperative Distributed Graph Algorithms 命令式分布式图算法的声明式模式
Pub Date : 2015-05-25 DOI: 10.1109/IPDPSW.2015.78
Marcin Zalewski, N. Edmonds, A. Lumsdaine
We provide an abstraction for expressing graph algorithms in which the vertices and edges of the graph provide locality and communication structure and graph data are represented by property maps that associate vertices and edges to arbitrary user-defined data. Operations on the graph are expressed as patterns, which allow limited traversal of the graph and modification of property maps for the traversed fragments of the graph. Traversal is implicit, and is automatically computed from the pattern's access of property map values. Patterns are declarative, but they can be used in imperative algorithms by using strategies that run in epochs. Strategies are user defined programs that apply patterns in a certain way (e.g., We provide fixed point, once, and ?-stepping strategies), including chaining patterns in an arbitrary way. Patterns are applied in epochs, which provide synchronization across a distributed system, guaranteeing that all patterns have been applied by the end of an epoch.
我们为表达图算法提供了一种抽象,其中图的顶点和边提供了局域性和通信结构,图数据由将顶点和边与任意用户定义数据相关联的属性映射表示。图上的操作表示为模式,模式允许对图进行有限的遍历,并修改所遍历的图片段的属性映射。遍历是隐式的,从模式对属性映射值的访问自动计算。模式是声明性的,但是可以通过使用按时代运行的策略在命令式算法中使用模式。策略是用户定义的程序,它以某种方式应用模式(例如,我们提供定点、一次和?步策略),包括以任意方式链接模式。模式在epoch中应用,它提供跨分布式系统的同步,保证所有模式在epoch结束时都已应用。
{"title":"Declarative Patterns for Imperative Distributed Graph Algorithms","authors":"Marcin Zalewski, N. Edmonds, A. Lumsdaine","doi":"10.1109/IPDPSW.2015.78","DOIUrl":"https://doi.org/10.1109/IPDPSW.2015.78","url":null,"abstract":"We provide an abstraction for expressing graph algorithms in which the vertices and edges of the graph provide locality and communication structure and graph data are represented by property maps that associate vertices and edges to arbitrary user-defined data. Operations on the graph are expressed as patterns, which allow limited traversal of the graph and modification of property maps for the traversed fragments of the graph. Traversal is implicit, and is automatically computed from the pattern's access of property map values. Patterns are declarative, but they can be used in imperative algorithms by using strategies that run in epochs. Strategies are user defined programs that apply patterns in a certain way (e.g., We provide fixed point, once, and ?-stepping strategies), including chaining patterns in an arbitrary way. Patterns are applied in epochs, which provide synchronization across a distributed system, guaranteeing that all patterns have been applied by the end of an epoch.","PeriodicalId":340697,"journal":{"name":"2015 IEEE International Parallel and Distributed Processing Symposium Workshop","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124009199","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adaptive Resource and Job Management for Limited Power Consumption 有限功耗下的自适应资源和作业管理
Pub Date : 2015-05-25 DOI: 10.1109/IPDPSW.2015.118
Yiannis Georgiou, David Glesser, D. Trystram
The last decades have been characterized by an ever growing requirement in terms of computing and storage resources. This tendency has recently put the pressure on the ability to efficiently manage the power required to operate the huge amount of electrical components associated with state-of-the-art high performance computing systems. The power consumption of a supercomputer needs to be adjusted based on varying power budget or electricity availabilities. As a consequence, Resource and Job Management Systems have to be adequately adapted in order to efficiently schedule jobs with optimized performance while limiting power usage whenever needed. We introduce in this paper a new scheduling strategy that can adapt the executed workload to a limited power budget. The originality of this approach relies upon a combination of speed scaling and node shutdown techniques for power reductions. It is implemented into the widely used resource and job management system SLURM. Finally, it is validated through large scale emulations using real production workload traces of the supercomputer Curie.
过去几十年的特点是对计算和存储资源的需求不断增长。这种趋势最近给有效管理与最先进的高性能计算系统相关的大量电子元件所需的功率的能力带来了压力。超级计算机的功耗需要根据不同的功率预算或电力可用性进行调整。因此,必须充分调整资源和作业管理系统,以便有效地调度具有优化性能的作业,同时在需要时限制电力使用。本文提出了一种新的调度策略,可以使执行的工作负载适应有限的电力预算。这种方法的独创性依赖于速度缩放和节点关闭技术的组合,以降低功耗。它被实现在广泛使用的资源和作业管理系统SLURM中。最后,利用超级计算机Curie的真实生产工作负载轨迹进行大规模仿真验证。
{"title":"Adaptive Resource and Job Management for Limited Power Consumption","authors":"Yiannis Georgiou, David Glesser, D. Trystram","doi":"10.1109/IPDPSW.2015.118","DOIUrl":"https://doi.org/10.1109/IPDPSW.2015.118","url":null,"abstract":"The last decades have been characterized by an ever growing requirement in terms of computing and storage resources. This tendency has recently put the pressure on the ability to efficiently manage the power required to operate the huge amount of electrical components associated with state-of-the-art high performance computing systems. The power consumption of a supercomputer needs to be adjusted based on varying power budget or electricity availabilities. As a consequence, Resource and Job Management Systems have to be adequately adapted in order to efficiently schedule jobs with optimized performance while limiting power usage whenever needed. We introduce in this paper a new scheduling strategy that can adapt the executed workload to a limited power budget. The originality of this approach relies upon a combination of speed scaling and node shutdown techniques for power reductions. It is implemented into the widely used resource and job management system SLURM. Finally, it is validated through large scale emulations using real production workload traces of the supercomputer Curie.","PeriodicalId":340697,"journal":{"name":"2015 IEEE International Parallel and Distributed Processing Symposium Workshop","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124011930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Energy Prediction of OpenMP Applications Using Random Forest Modeling Approach 基于随机森林建模方法的OpenMP应用程序能量预测
Pub Date : 2015-05-25 DOI: 10.1109/IPDPSW.2015.12
S. Benedict, R. Rejitha, P. Gschwandtner, R. Prodan, T. Fahringer
OpenMP, with its extended parallelism features and support for radically changing HPC architectures, spurred to a surge in developing parallel applications among the HPC application developers community, leading to severe energy consumption issues. Consequently, a notion of addressing the energy consumption issue of HPC applications in an automated fashion increased among compiler developers although the underlying optimization search space could increase tremendously. This paper proposes a Random Forest Modeling (RFM) approach for predicting the energy consumption of OpenMP applications in compilers. The approach was tested using OpenMP applications, such as, NAS benchmarks, matrix multiplication, n-body simulations, and stencil applications while tuning the applications based on energy, problem size, and other performance concerns. The proposed RFM approach predicted the energy consumption of code variants with less than 0.699 Mean Square Error (MSE) and 0.998 R2 value when the testing dataset had energy variations between 0.024 joules and 150.23 joules. In addition, the influences of energy variations, number of independent variables used, and the proportion of testing dataset used during the RFM modeling process are discussed.
OpenMP扩展了并行特性,并支持从根本上改变HPC架构,在HPC应用程序开发人员社区中掀起了开发并行应用程序的热潮,导致了严重的能耗问题。因此,在编译器开发人员中,以自动化的方式解决HPC应用程序的能耗问题的概念越来越多,尽管底层优化搜索空间可能会大大增加。本文提出了一种随机森林模型(RFM)方法来预测OpenMP应用程序在编译器中的能耗。使用OpenMP应用程序(如NAS基准测试、矩阵乘法、n体模拟和模板应用程序)对该方法进行了测试,同时根据能源、问题大小和其他性能问题对应用程序进行了调优。当测试数据集的能量变化在0.024 ~ 150.23焦耳之间时,所提出的RFM方法预测代码变体的能量消耗小于0.699均方误差(MSE)和0.998 R2值。此外,还讨论了能量变化、自变量数量和测试数据集比例对RFM建模过程的影响。
{"title":"Energy Prediction of OpenMP Applications Using Random Forest Modeling Approach","authors":"S. Benedict, R. Rejitha, P. Gschwandtner, R. Prodan, T. Fahringer","doi":"10.1109/IPDPSW.2015.12","DOIUrl":"https://doi.org/10.1109/IPDPSW.2015.12","url":null,"abstract":"OpenMP, with its extended parallelism features and support for radically changing HPC architectures, spurred to a surge in developing parallel applications among the HPC application developers community, leading to severe energy consumption issues. Consequently, a notion of addressing the energy consumption issue of HPC applications in an automated fashion increased among compiler developers although the underlying optimization search space could increase tremendously. This paper proposes a Random Forest Modeling (RFM) approach for predicting the energy consumption of OpenMP applications in compilers. The approach was tested using OpenMP applications, such as, NAS benchmarks, matrix multiplication, n-body simulations, and stencil applications while tuning the applications based on energy, problem size, and other performance concerns. The proposed RFM approach predicted the energy consumption of code variants with less than 0.699 Mean Square Error (MSE) and 0.998 R2 value when the testing dataset had energy variations between 0.024 joules and 150.23 joules. In addition, the influences of energy variations, number of independent variables used, and the proportion of testing dataset used during the RFM modeling process are discussed.","PeriodicalId":340697,"journal":{"name":"2015 IEEE International Parallel and Distributed Processing Symposium Workshop","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129772216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Performance Analysis for Target Devices with the OpenMP Tools Interface 使用OpenMP工具接口对目标设备进行性能分析
Pub Date : 2015-05-25 DOI: 10.1109/IPDPSW.2015.27
Tim Cramer, R. Dietrich, C. Terboven, Matthias S. Müller, W. Nagel
The requirement for large compute capabilities led to a wide use of accelerated high performance computing systems. In order to lower the burden for programming these new architectures, user friendly programming paradigms like OpenACC and OpenMP have come to existence. They offer pragmas to shift effort from the programmer to the compiler and runtime system, particularly for data management. However, for further improvement of the usability an adequate tools support is required as well. In our work we present in detail a general extension to the upcoming OpenMP tools interface (OMPT) with respect to the new OpenMP 4.0 target constructs. This extension aims to be a portable, vendor- and platform independent interface to enable the use of performance analysis tools with OpenMP for Accelerators. Finally, we evaluate the approach in a reference implementation to prove the validity and usability with the help of an instrumented OpenMP runtime and the Score-P measurement infrastructure.
对大型计算能力的需求导致了加速高性能计算系统的广泛使用。为了减轻编写这些新架构的负担,出现了用户友好的编程范例,如OpenACC和OpenMP。它们提供了将工作从程序员转移到编译器和运行时系统的pragmas,特别是数据管理。然而,为了进一步改进可用性,还需要适当的工具支持。在我们的工作中,我们详细介绍了针对新的OpenMP 4.0目标结构的即将到来的OpenMP工具接口(OMPT)的通用扩展。这个扩展的目的是成为一个可移植的,供应商和平台独立的接口,使性能分析工具与OpenMP加速器的使用。最后,我们在一个参考实现中评估了该方法,以证明在仪器化OpenMP运行时和Score-P测量基础设施的帮助下,该方法的有效性和可用性。
{"title":"Performance Analysis for Target Devices with the OpenMP Tools Interface","authors":"Tim Cramer, R. Dietrich, C. Terboven, Matthias S. Müller, W. Nagel","doi":"10.1109/IPDPSW.2015.27","DOIUrl":"https://doi.org/10.1109/IPDPSW.2015.27","url":null,"abstract":"The requirement for large compute capabilities led to a wide use of accelerated high performance computing systems. In order to lower the burden for programming these new architectures, user friendly programming paradigms like OpenACC and OpenMP have come to existence. They offer pragmas to shift effort from the programmer to the compiler and runtime system, particularly for data management. However, for further improvement of the usability an adequate tools support is required as well. In our work we present in detail a general extension to the upcoming OpenMP tools interface (OMPT) with respect to the new OpenMP 4.0 target constructs. This extension aims to be a portable, vendor- and platform independent interface to enable the use of performance analysis tools with OpenMP for Accelerators. Finally, we evaluate the approach in a reference implementation to prove the validity and usability with the help of an instrumented OpenMP runtime and the Score-P measurement infrastructure.","PeriodicalId":340697,"journal":{"name":"2015 IEEE International Parallel and Distributed Processing Symposium Workshop","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128346038","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Communication Pattern-Based Distributed Snapshots in Large-Scale Systems 大规模系统中基于通信模式的分布式快照
Pub Date : 2015-05-25 DOI: 10.1109/IPDPSW.2015.117
Salem Saker, A. Agbaria
Large-Scale systems (LSSs) continue to attract more attention from the scientific community for addressing high-performance computing. Providing fault tolerance in distributed systems is a challenge. This challenge doubtlessly becomes more difficult in LSSs. Distributed snapshots are an important building block for distributed systems, and, among other applications, are useful for providing fault tolerance. This paper motivates the need for providing fault tolerance in LSSs and focuses on the limitations behind this provision. It then presents an innovative and scalable distributed snapshots approach for LSSs. In this approach, upon a new snapshot, a process coordinates only with the processes that it has communicated with since the last snapshot. Our protocol improves the Chandy and Lamport distributed snapshot protocol which was presented in 1985. This improvement may enable developers and planners of systems to consider this protocol. We compare the performance of our new approach to the performance of other existing well-known distributed snapshot approaches using stochastic models. The results show that our approach achieves lower overhead with significant improvement.
大规模系统(Large-Scale system, lss)在解决高性能计算问题方面不断引起科学界的关注。在分布式系统中提供容错是一个挑战。这一挑战无疑在lss中变得更加困难。分布式快照是分布式系统的重要构建块,在其他应用程序中,它有助于提供容错性。本文提出了在lss中提供容错的需求,并重点讨论了这一规定背后的限制。然后,它为lss提供了一种创新的、可扩展的分布式快照方法。在这种方法中,在新的快照上,进程只与自上次快照以来与其通信的进程进行协调。我们的协议改进了1985年提出的Chandy和Lamport分布式快照协议。这种改进可以使系统的开发人员和计划人员考虑这个协议。我们将新方法的性能与使用随机模型的其他已知分布式快照方法的性能进行了比较。结果表明,我们的方法实现了较低的开销和显著的改进。
{"title":"Communication Pattern-Based Distributed Snapshots in Large-Scale Systems","authors":"Salem Saker, A. Agbaria","doi":"10.1109/IPDPSW.2015.117","DOIUrl":"https://doi.org/10.1109/IPDPSW.2015.117","url":null,"abstract":"Large-Scale systems (LSSs) continue to attract more attention from the scientific community for addressing high-performance computing. Providing fault tolerance in distributed systems is a challenge. This challenge doubtlessly becomes more difficult in LSSs. Distributed snapshots are an important building block for distributed systems, and, among other applications, are useful for providing fault tolerance. This paper motivates the need for providing fault tolerance in LSSs and focuses on the limitations behind this provision. It then presents an innovative and scalable distributed snapshots approach for LSSs. In this approach, upon a new snapshot, a process coordinates only with the processes that it has communicated with since the last snapshot. Our protocol improves the Chandy and Lamport distributed snapshot protocol which was presented in 1985. This improvement may enable developers and planners of systems to consider this protocol. We compare the performance of our new approach to the performance of other existing well-known distributed snapshot approaches using stochastic models. The results show that our approach achieves lower overhead with significant improvement.","PeriodicalId":340697,"journal":{"name":"2015 IEEE International Parallel and Distributed Processing Symposium Workshop","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128405719","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Mini-NOVA: A Lightweight ARM-based Virtualization Microkernel Supporting Dynamic Partial Reconfiguration Mini-NOVA:一个轻量级的基于arm的虚拟化微内核,支持动态部分重构
Pub Date : 2015-05-25 DOI: 10.1109/IPDPSW.2015.72
Tian Xia, Jean-Christophe Prévotet, F. Nouvel
Today, ARM is becoming the mainstream family of processors in the high-performance embedded systems domain. In this context, adding a run-time reconfigurable FPGA device to the ARM processor into a single chip makes it possible to combine high performance and flexibility. In this paper, we propose a low-complexity design of system virtualization running on the Zynq platform. Virtualization of software and hardware resources are managed by a custom microkernel. The dedicated features to efficiently manage the dynamic partial reconfiguration (DPR) technology are described in details. The performance of the DPR management is evaluated and presented at the end of this paper.
今天,ARM正在成为高性能嵌入式系统领域的主流处理器家族。在这种情况下,将运行时可重构的FPGA器件添加到ARM处理器中,使其能够结合高性能和灵活性。本文提出了一种运行在Zynq平台上的低复杂度系统虚拟化设计。软件和硬件资源的虚拟化由自定义微内核管理。详细描述了有效管理动态部分重构(DPR)技术的专用特性。最后对DPR管理的绩效进行了评价和介绍。
{"title":"Mini-NOVA: A Lightweight ARM-based Virtualization Microkernel Supporting Dynamic Partial Reconfiguration","authors":"Tian Xia, Jean-Christophe Prévotet, F. Nouvel","doi":"10.1109/IPDPSW.2015.72","DOIUrl":"https://doi.org/10.1109/IPDPSW.2015.72","url":null,"abstract":"Today, ARM is becoming the mainstream family of processors in the high-performance embedded systems domain. In this context, adding a run-time reconfigurable FPGA device to the ARM processor into a single chip makes it possible to combine high performance and flexibility. In this paper, we propose a low-complexity design of system virtualization running on the Zynq platform. Virtualization of software and hardware resources are managed by a custom microkernel. The dedicated features to efficiently manage the dynamic partial reconfiguration (DPR) technology are described in details. The performance of the DPR management is evaluated and presented at the end of this paper.","PeriodicalId":340697,"journal":{"name":"2015 IEEE International Parallel and Distributed Processing Symposium Workshop","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128197270","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
期刊
2015 IEEE International Parallel and Distributed Processing Symposium Workshop
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1