首页 > 最新文献

高性能计算技术最新文献

英文 中文
Proceedings of the 5th International Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing 第五届高性能计算领域特定语言和高级框架国际研讨会论文集
Pub Date : 2015-01-01 DOI: 10.1145/2830018
{"title":"Proceedings of the 5th International Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing","authors":"","doi":"10.1145/2830018","DOIUrl":"https://doi.org/10.1145/2830018","url":null,"abstract":"","PeriodicalId":59014,"journal":{"name":"高性能计算技术","volume":"76 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86179583","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PTG: An Abstraction for Unhindered Parallelism PTG:无阻碍并行的抽象
Pub Date : 2014-11-16 DOI: 10.1109/WOLFHPC.2014.8
Anthony Danalis, G. Bosilca, Aurélien Bouteiller, T. Hérault, J. Dongarra
Increased parallelism and use of heterogeneous computing resources is now an established trend in High Performance Computing (HPC), a trend that, looking forward to Exascale, seems bound to intensify. Despite the evolution of hardware over the past decade, the programming paradigm of choice was invariably derived from Coarse Grain Parallelism with explicit data movements. We argue that message passing has remained the de facto standard in HPC because, until now, the ever increasing challenges that application developers had to address to create efficient portable applications remained manageable for expert programmers.Data-flow based programming is an alternative approach with significant potential. In this paper, we discuss the Parameterized Task Graph (PTG) abstraction and present the specialized input language that we use to specify PTGs in our data-flow task-based runtime system, PaRSEC. This language and the corresponding execution model are in contrast with the execution model of explicit message passing as well as the model of alternative task based runtime systems. The Parameterized Task Graph language decouples the expression of the parallelism in the algorithm from the control-flow ordering, load balance, and data distribution. Thus, programs are more adaptable and map more efficiently on challenging hardware, as well as maintain portability across diverse architectures. To support these claims, we discuss the different challenges of HPC programming and how PaR-SEC can address them, and we demonstrate that in today's large scale supercomputers, PaRSEC can significantly outperform state-of-the-art MPI applications and libraries, a trend that will increase with future architectural evolution.
增加并行性和使用异构计算资源现在是高性能计算(HPC)的既定趋势,这一趋势,展望Exascale,似乎必然会加强。尽管硬件在过去十年中不断发展,但选择的编程范式总是来自具有显式数据移动的粗粒度并行。我们认为,消息传递在HPC中一直是事实上的标准,因为直到现在,应用程序开发人员为创建高效的可移植应用程序而必须解决的不断增加的挑战仍然是专业程序员可以管理的。基于数据流的编程是一种具有巨大潜力的替代方法。在本文中,我们讨论了参数化任务图(Parameterized Task Graph, PTG)的抽象,并给出了我们在基于数据流任务的运行时系统PaRSEC中用来指定PTG的专用输入语言。这种语言和相应的执行模型与显式消息传递的执行模型以及基于备选任务的运行时系统的模型形成对比。参数化任务图语言将算法中并行度的表达与控制流排序、负载平衡和数据分布解耦。因此,程序在具有挑战性的硬件上具有更强的适应性和更有效的映射,并保持跨不同体系结构的可移植性。为了支持这些说法,我们讨论了HPC编程的不同挑战以及PaR-SEC如何解决这些挑战,并证明了在当今的大型超级计算机中,PaRSEC可以显著优于最先进的MPI应用程序和库,这一趋势将随着未来架构的发展而增强。
{"title":"PTG: An Abstraction for Unhindered Parallelism","authors":"Anthony Danalis, G. Bosilca, Aurélien Bouteiller, T. Hérault, J. Dongarra","doi":"10.1109/WOLFHPC.2014.8","DOIUrl":"https://doi.org/10.1109/WOLFHPC.2014.8","url":null,"abstract":"Increased parallelism and use of heterogeneous computing resources is now an established trend in High Performance Computing (HPC), a trend that, looking forward to Exascale, seems bound to intensify. Despite the evolution of hardware over the past decade, the programming paradigm of choice was invariably derived from Coarse Grain Parallelism with explicit data movements. We argue that message passing has remained the de facto standard in HPC because, until now, the ever increasing challenges that application developers had to address to create efficient portable applications remained manageable for expert programmers.Data-flow based programming is an alternative approach with significant potential. In this paper, we discuss the Parameterized Task Graph (PTG) abstraction and present the specialized input language that we use to specify PTGs in our data-flow task-based runtime system, PaRSEC. This language and the corresponding execution model are in contrast with the execution model of explicit message passing as well as the model of alternative task based runtime systems. The Parameterized Task Graph language decouples the expression of the parallelism in the algorithm from the control-flow ordering, load balance, and data distribution. Thus, programs are more adaptable and map more efficiently on challenging hardware, as well as maintain portability across diverse architectures. To support these claims, we discuss the different challenges of HPC programming and how PaR-SEC can address them, and we demonstrate that in today's large scale supercomputers, PaRSEC can significantly outperform state-of-the-art MPI applications and libraries, a trend that will increase with future architectural evolution.","PeriodicalId":59014,"journal":{"name":"高性能计算技术","volume":"4 1","pages":"21-30"},"PeriodicalIF":0.0,"publicationDate":"2014-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87686733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 52
HSLOT: The HERCULES Scriptable Loop Transformations Engine HSLOT: HERCULES脚本循环转换引擎
Pub Date : 2014-11-16 DOI: 10.1109/WOLFHPC.2014.10
Christos Kartsaklis, Eunjung Park, John Cavazos
HSLOT arms users with a rich set of configurable transformation directives, to be used as-they-are or to be specialized and combined into powerful custom transformations. We offer a plethora of loop transformations, which includes both the classic set (unroll, fuse, fission, tile, and so on) as well as unique ones (specialize, swap nest, split, fork, and so on) that are not found in other state-of-the-art systems. We show how HSLOT enables more transformations such as merging two loops that cannot be fused because of data dependencies and how HSLOT can be used in a simple and systematic fashion to improve memory accesses and expose better parallelism. To use our system, users simply annotate loops with the transformations sequence and compile with our Open64-based HSLOTimplementing Fortran compiler, HSLF90, which produces both object files and optionally source. We describe our experiment results using a set of scientific kernels written in Fortran with HSLOT directives on AMD 32 core system.
HSLOT为用户提供了一组丰富的可配置转换指令,可以按原样使用,也可以专一化并组合成强大的自定义转换。我们提供了大量的循环转换,其中既包括经典的设置(unroll, fuse, fission, tile等),也包括独特的设置(specialized, swap nest, split, fork等),这些都是在其他先进的系统中找不到的。我们展示了HSLOT如何支持更多的转换,例如合并由于数据依赖而不能融合的两个循环,以及HSLOT如何以简单和系统的方式使用来改进内存访问并提供更好的并行性。要使用我们的系统,用户只需用转换序列注释循环,并使用基于open64的HSLOTimplementing Fortran编译器HSLF90进行编译,该编译器可以生成目标文件和可选的源代码。我们描述了在AMD 32核系统上使用一组用Fortran编写的带有HSLOT指令的科学内核的实验结果。
{"title":"HSLOT: The HERCULES Scriptable Loop Transformations Engine","authors":"Christos Kartsaklis, Eunjung Park, John Cavazos","doi":"10.1109/WOLFHPC.2014.10","DOIUrl":"https://doi.org/10.1109/WOLFHPC.2014.10","url":null,"abstract":"HSLOT arms users with a rich set of configurable transformation directives, to be used as-they-are or to be specialized and combined into powerful custom transformations. We offer a plethora of loop transformations, which includes both the classic set (unroll, fuse, fission, tile, and so on) as well as unique ones (specialize, swap nest, split, fork, and so on) that are not found in other state-of-the-art systems. We show how HSLOT enables more transformations such as merging two loops that cannot be fused because of data dependencies and how HSLOT can be used in a simple and systematic fashion to improve memory accesses and expose better parallelism. To use our system, users simply annotate loops with the transformations sequence and compile with our Open64-based HSLOTimplementing Fortran compiler, HSLF90, which produces both object files and optionally source. We describe our experiment results using a set of scientific kernels written in Fortran with HSLOT directives on AMD 32 core system.","PeriodicalId":59014,"journal":{"name":"高性能计算技术","volume":"40 1","pages":"31-41"},"PeriodicalIF":0.0,"publicationDate":"2014-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87400122","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A Data Flow Language to Develop High Performance Computing DSLs 一种开发高性能计算dsl的数据流语言
Pub Date : 2014-11-16 DOI: 10.1109/WOLFHPC.2014.6
Alejandro Fernández, Vicencc Beltran, Sergi Mateo, Tomasz Patejko, E. Ayguadé
Developing complex scientific applications on high performance systems requires both domain knowledge and expertise in parallel and distributed programming models. In addition, modern high performance systems are heterogeneous, thus composed of multicores and accelerators, which despite being efficient and powerful, are harder to program. Domain-Specific Languages (DSLs) are a promising approach to hide the complexity of HPC systems and boost programmer's productivity. However, the huge cost and complexity of implementing efficient and scalable DSLs on HPC systems is hindering its adoption for most domains. Addressing such problems, we present Data Flow Language (DFL), a DSL designed to exploit distributed and heterogeneous HPC systems. DFL abstracts the key concepts such systems as SMP tasks for multicores, kernels for accelerators and high-level operations for distributed computing. In addition, DFL leverages the hybrid MPI/OmpSs data-flow programming model to efficiently implement the previous concepts. All of these features make DFL suitable as the target language for other DSLs. However, it is also suitable as a fast prototyping language to develop distributed applications on heterogeneous systems.
在高性能系统上开发复杂的科学应用程序需要并行和分布式编程模型的领域知识和专业知识。此外,现代高性能系统是异构的,因此由多核和加速器组成,尽管高效和强大,但很难编程。领域特定语言(dsl)是一种很有前途的方法,可以隐藏高性能计算系统的复杂性,提高程序员的工作效率。然而,在高性能计算系统上实现高效和可扩展的dsl的巨大成本和复杂性阻碍了它在大多数领域的采用。为了解决这些问题,我们提出了数据流语言(DFL),一种旨在利用分布式和异构HPC系统的DSL。DFL抽象了关键概念,如多核SMP任务系统、加速器内核和分布式计算高级操作系统。此外,DFL利用混合MPI/ omps数据流编程模型来有效地实现前面的概念。所有这些特性使得DFL适合作为其他dsl的目标语言。然而,它也适合作为一种快速原型语言在异构系统上开发分布式应用程序。
{"title":"A Data Flow Language to Develop High Performance Computing DSLs","authors":"Alejandro Fernández, Vicencc Beltran, Sergi Mateo, Tomasz Patejko, E. Ayguadé","doi":"10.1109/WOLFHPC.2014.6","DOIUrl":"https://doi.org/10.1109/WOLFHPC.2014.6","url":null,"abstract":"Developing complex scientific applications on high performance systems requires both domain knowledge and expertise in parallel and distributed programming models. In addition, modern high performance systems are heterogeneous, thus composed of multicores and accelerators, which despite being efficient and powerful, are harder to program. Domain-Specific Languages (DSLs) are a promising approach to hide the complexity of HPC systems and boost programmer's productivity. However, the huge cost and complexity of implementing efficient and scalable DSLs on HPC systems is hindering its adoption for most domains. Addressing such problems, we present Data Flow Language (DFL), a DSL designed to exploit distributed and heterogeneous HPC systems. DFL abstracts the key concepts such systems as SMP tasks for multicores, kernels for accelerators and high-level operations for distributed computing. In addition, DFL leverages the hybrid MPI/OmpSs data-flow programming model to efficiently implement the previous concepts. All of these features make DFL suitable as the target language for other DSLs. However, it is also suitable as a fast prototyping language to develop distributed applications on heterogeneous systems.","PeriodicalId":59014,"journal":{"name":"高性能计算技术","volume":"15 1","pages":"11-20"},"PeriodicalIF":0.0,"publicationDate":"2014-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80033525","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
The OPS Domain Specific Abstraction for Multi-block Structured Grid Computations 面向多块结构网格计算的OPS领域特定抽象
Pub Date : 2014-11-16 DOI: 10.1109/WOLFHPC.2014.7
I. Reguly, G. Mudalige, M. Giles, Dan Curran, Simon McIntosh-Smith
Code maintainability, performance portability and future proofing are some of the key challenges in this era of rapid change in High Performance Computing. Domain Specific Languages and Active Libraries address these challenges by focusing on a single application domain and providing a high-level programming approach, and then subsequently using domain knowledge to deliver high performance on various hardware. In this paper, we introduce the OPS high-level abstraction and active library aimed at multi-block structured grid computations, and discuss some of its key design points; we demonstrate how OPS can be embedded in C/C++ and the API made to look like a traditional library, and how through a combination of simple text manipulation and back-end logic we can enable execution on a diverse range of hardware using different parallel programming approaches. Relying on the access-execute description of the OPS abstraction, we introduce a number of automated execution techniques that enable distributed memory parallelization, optimization of communication patterns, checkpointing and cache-blocking. Using performance results from CloverLeaf from the Mantevo suite of benchmarks, we demonstrate the utility of OPS.
代码可维护性、性能可移植性和对未来的验证是高性能计算快速变化时代的一些关键挑战。领域特定语言和活动库通过专注于单个应用程序领域并提供高级编程方法来解决这些挑战,然后使用领域知识在各种硬件上提供高性能。本文介绍了面向多块结构网格计算的OPS高级抽象和活动库,并讨论了其设计要点;我们演示了如何将OPS嵌入到C/ c++中,如何使API看起来像一个传统的库,以及如何通过简单的文本操作和后端逻辑的组合,我们可以使用不同的并行编程方法在各种硬件上执行。基于OPS抽象的访问-执行描述,我们引入了许多自动执行技术,这些技术支持分布式内存并行化、通信模式优化、检查点和缓存阻塞。使用来自Mantevo基准套件的CloverLeaf的性能结果,我们演示了OPS的实用性。
{"title":"The OPS Domain Specific Abstraction for Multi-block Structured Grid Computations","authors":"I. Reguly, G. Mudalige, M. Giles, Dan Curran, Simon McIntosh-Smith","doi":"10.1109/WOLFHPC.2014.7","DOIUrl":"https://doi.org/10.1109/WOLFHPC.2014.7","url":null,"abstract":"Code maintainability, performance portability and future proofing are some of the key challenges in this era of rapid change in High Performance Computing. Domain Specific Languages and Active Libraries address these challenges by focusing on a single application domain and providing a high-level programming approach, and then subsequently using domain knowledge to deliver high performance on various hardware. In this paper, we introduce the OPS high-level abstraction and active library aimed at multi-block structured grid computations, and discuss some of its key design points; we demonstrate how OPS can be embedded in C/C++ and the API made to look like a traditional library, and how through a combination of simple text manipulation and back-end logic we can enable execution on a diverse range of hardware using different parallel programming approaches. Relying on the access-execute description of the OPS abstraction, we introduce a number of automated execution techniques that enable distributed memory parallelization, optimization of communication patterns, checkpointing and cache-blocking. Using performance results from CloverLeaf from the Mantevo suite of benchmarks, we demonstrate the utility of OPS.","PeriodicalId":59014,"journal":{"name":"高性能计算技术","volume":"37 1","pages":"58-67"},"PeriodicalIF":0.0,"publicationDate":"2014-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76938675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 59
GridPACK: A Framework for Developing Power Grid Simulations on High Performance Computing Platforms GridPACK:在高性能计算平台上开发电网模拟的框架
Pub Date : 2014-11-16 DOI: 10.1109/WOLFHPC.2014.12
B. Palmer, W. Perkins, Yousu Chen, Shuangshuang Jin, D. Callahan, Kevin A. Glass, R. Diao, M. Rice, S. Elbert, M. Vallem, Zhenyu Huang
This paper describes the GridPACKTM framework, which is designed to help power grid engineers develop modeling software capable of running on high performance computers. The framework makes extensive use of software templates to provide high level functionality while at the same time allowing developers the freedom to express whatever models and algorithms they are using. GridPACKTM contains modules for setting up distributed power grid networks, assigning buses and branches with arbitrary behaviors to the network, creating distributed matrices and vectors and using parallel linear and non-linear solvers to solve algebraic equations. It also provides mappers to create matrices and vectors based on properties of the network and functionality to support IO and to manage errors. The goal of GridPACKTM is to substantially reduce the complexity of writing software for parallel computers while still providing efficient and scalable software solutions. The use of GridPACKTM is illustrated for a simple powerflow example and performance results for powerflow and dynamic simulation are discussed.
本文介绍了GridPACKTM框架,该框架旨在帮助电网工程师开发能够在高性能计算机上运行的建模软件。该框架广泛使用软件模板来提供高级功能,同时允许开发人员自由表达他们正在使用的任何模型和算法。GridPACKTM包含用于建立分布式电网网络、分配具有任意行为的总线和分支网络、创建分布式矩阵和向量以及使用并行线性和非线性求解器求解代数方程的模块。它还提供映射器来创建基于网络属性和功能的矩阵和向量,以支持IO和管理错误。GridPACKTM的目标是大大降低为并行计算机编写软件的复杂性,同时仍然提供高效和可扩展的软件解决方案。以一个简单的功率流实例说明了GridPACKTM的使用,并讨论了功率流和动态仿真的性能结果。
{"title":"GridPACK: A Framework for Developing Power Grid Simulations on High Performance Computing Platforms","authors":"B. Palmer, W. Perkins, Yousu Chen, Shuangshuang Jin, D. Callahan, Kevin A. Glass, R. Diao, M. Rice, S. Elbert, M. Vallem, Zhenyu Huang","doi":"10.1109/WOLFHPC.2014.12","DOIUrl":"https://doi.org/10.1109/WOLFHPC.2014.12","url":null,"abstract":"This paper describes the GridPACKTM framework, which is designed to help power grid engineers develop modeling software capable of running on high performance computers. The framework makes extensive use of software templates to provide high level functionality while at the same time allowing developers the freedom to express whatever models and algorithms they are using. GridPACKTM contains modules for setting up distributed power grid networks, assigning buses and branches with arbitrary behaviors to the network, creating distributed matrices and vectors and using parallel linear and non-linear solvers to solve algebraic equations. It also provides mappers to create matrices and vectors based on properties of the network and functionality to support IO and to manage errors. The goal of GridPACKTM is to substantially reduce the complexity of writing software for parallel computers while still providing efficient and scalable software solutions. The use of GridPACKTM is illustrated for a simple powerflow example and performance results for powerflow and dynamic simulation are discussed.","PeriodicalId":59014,"journal":{"name":"高性能计算技术","volume":"37 1","pages":"68-77"},"PeriodicalIF":0.0,"publicationDate":"2014-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77787165","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Exploring the Construction of a Domain-Aware Toolchain for High-Performance Computing 面向高性能计算的领域感知工具链构建研究
Pub Date : 2014-11-16 DOI: 10.1109/WOLFHPC.2014.9
P. McCormick, Christine Sweeney, Nicholas D. Moss, Dean Prichard, S. Gutierrez, K. Davis, J. Mohd-Yusof
The push towards exascale computing has sparked a new set of explorations for providing new productive programming environments. While many efforts are focusing on the design and development of domain-specific languages (DSLs), few have addressed the need for providing a fully domain-aware toolchain. Without such domain awareness critical features for achieving acceptance and adoption, such as debugger support, pose a long-term risk to the overall success of the DSL approach. In this paper we explore the use of language extensions to design and implement the Scout DSL and a supporting toolchain infrastructure. We highlight how language features and the software design methodologies used within the toolchain play a significant role in providing a suitable environment for DSL development.
对百亿亿次计算的推动引发了一系列新的探索,以提供新的生产性编程环境。虽然许多工作都集中在领域特定语言(dsl)的设计和开发上,但很少有人解决了提供完全领域感知工具链的需求。如果没有这样的领域感知,实现接受和采用的关键特性,比如调试器支持,就会给DSL方法的整体成功带来长期风险。在本文中,我们探讨了使用语言扩展来设计和实现Scout DSL和支持工具链基础结构。我们强调了工具链中使用的语言特性和软件设计方法如何在为DSL开发提供合适的环境方面发挥重要作用。
{"title":"Exploring the Construction of a Domain-Aware Toolchain for High-Performance Computing","authors":"P. McCormick, Christine Sweeney, Nicholas D. Moss, Dean Prichard, S. Gutierrez, K. Davis, J. Mohd-Yusof","doi":"10.1109/WOLFHPC.2014.9","DOIUrl":"https://doi.org/10.1109/WOLFHPC.2014.9","url":null,"abstract":"The push towards exascale computing has sparked a new set of explorations for providing new productive programming environments. While many efforts are focusing on the design and development of domain-specific languages (DSLs), few have addressed the need for providing a fully domain-aware toolchain. Without such domain awareness critical features for achieving acceptance and adoption, such as debugger support, pose a long-term risk to the overall success of the DSL approach. In this paper we explore the use of language extensions to design and implement the Scout DSL and a supporting toolchain infrastructure. We highlight how language features and the software design methodologies used within the toolchain play a significant role in providing a suitable environment for DSL development.","PeriodicalId":59014,"journal":{"name":"高性能计算技术","volume":"1 1","pages":"1-10"},"PeriodicalIF":0.0,"publicationDate":"2014-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82172546","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Target-Specific Refinement of Multigrid Codes 多网格代码的目标特定细化
Pub Date : 2014-11-01 DOI: 10.1109/WOLFHPC.2014.5
Richard Membarth, P. Slusallek, M. Köster, Roland Leißa, Sebastian Hack
This paper applies partial evaluation to stage a stencil code Domain-Specific Language (DSL) onto a functional and imperative programming language. Platform-specific primitives such as scheduling or vectorization, and algorithmic variants such as boundary handling are factored out into a library that make up the elements of that DSL. We show how partial evaluation can eliminate all overhead of this separation of concerns and creates code that resembles hand-crafted versions for a particular target platform. We evaluate our technique by implementing a DSL for the V-cycle multigrid iteration. Our approach generates code for AMD and NVIDIA GPUs (via SPIR and NVVM) as well as for CPUs using AVX/AVX2 alike from the same high-level DSL program. First results show that we achieve a speedup of up to 3x on the CPU by vectorizing multigrid components and a speedup of up to 2x on the GPU by merging the computation of multigrid components.
本文应用部分求值方法将模板代码领域特定语言(DSL)过渡到函数式命令式编程语言。特定于平台的原语(如调度或向量化)和算法变体(如边界处理)被分解到组成该DSL元素的库中。我们将展示部分求值如何消除这种关注点分离的所有开销,并创建类似于为特定目标平台手工制作版本的代码。我们通过实现v循环多网格迭代的DSL来评估我们的技术。我们的方法为AMD和NVIDIA gpu(通过SPIR和NVVM)以及使用AVX/AVX2的cpu从相同的高级DSL程序生成代码。首先,通过向量化多网格组件,我们在CPU上实现了高达3倍的加速;通过合并多网格组件的计算,我们在GPU上实现了高达2倍的加速。
{"title":"Target-Specific Refinement of Multigrid Codes","authors":"Richard Membarth, P. Slusallek, M. Köster, Roland Leißa, Sebastian Hack","doi":"10.1109/WOLFHPC.2014.5","DOIUrl":"https://doi.org/10.1109/WOLFHPC.2014.5","url":null,"abstract":"This paper applies partial evaluation to stage a stencil code Domain-Specific Language (DSL) onto a functional and imperative programming language. Platform-specific primitives such as scheduling or vectorization, and algorithmic variants such as boundary handling are factored out into a library that make up the elements of that DSL. We show how partial evaluation can eliminate all overhead of this separation of concerns and creates code that resembles hand-crafted versions for a particular target platform. We evaluate our technique by implementing a DSL for the V-cycle multigrid iteration. Our approach generates code for AMD and NVIDIA GPUs (via SPIR and NVVM) as well as for CPUs using AVX/AVX2 alike from the same high-level DSL program. First results show that we achieve a speedup of up to 3x on the CPU by vectorizing multigrid components and a speedup of up to 2x on the GPU by merging the computation of multigrid components.","PeriodicalId":59014,"journal":{"name":"高性能计算技术","volume":"17 1","pages":"52-57"},"PeriodicalIF":0.0,"publicationDate":"2014-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85156737","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
ExaSlang: A Domain-Specific Language for Highly Scalable Multigrid Solvers ExaSlang:用于高度可伸缩的多网格求解器的领域特定语言
Pub Date : 2014-11-01 DOI: 10.1109/WOLFHPC.2014.11
Christian Schmitt, S. Kuckuk, Frank Hannig, H. Köstler, Jürgen Teich
High-Performance Computing (HPC) systems are becoming increasingly parallel and heterogeneous. As a consequence, HPC applications, such as simulation software, need to be especially designed towards these systems to achieve optimal performance. This, in turn, leads to higher complexity, making software engineers and scientists require a deep knowledge of the hardware and its technologies. As a remedy, domain-specific languages (DSLs) are a convenient technology for domain experts to describe settings and problems they want to solve using terms and models familiar to them. This specification is transformed into a target language, i. e., source code in another programming language or a binary executable, by a specialized compiler. We propose ExaSlang, a language for the specification of numerical solvers based on the multigrid method targeting distributed-memory systems. Furthermore, we present the transformation framework that drives the corresponding source-to-source compiler. It emits C++ code utilizing a hybrid OpenMP and MPI parallelization. Moreover, we substantiate our approach with scaling results of our code scaling up to the complete JUQUEEN cluster, consisting of 28,672 nodes, with a total of 458,752 cores.
高性能计算(HPC)系统正变得越来越并行和异构。因此,HPC应用程序,如仿真软件,需要专门针对这些系统进行设计,以实现最佳性能。这反过来又导致了更高的复杂性,使得软件工程师和科学家需要对硬件及其技术有深入的了解。作为补救,领域特定语言(dsl)是一种方便的技术,领域专家可以使用他们熟悉的术语和模型来描述他们想要解决的设置和问题。该规范通过专门的编译器转换为目标语言,即另一种编程语言的源代码或二进制可执行文件。我们提出了ExaSlang,一种针对分布式存储系统的基于多重网格方法的数值求解规范语言。此外,我们还提供了驱动相应的源到源编译器的转换框架。它使用混合的OpenMP和MPI并行化来发布c++代码。此外,我们用扩展结果证实了我们的方法,我们的代码扩展到完整的JUQUEEN集群,由28,672个节点组成,共有458,752个内核。
{"title":"ExaSlang: A Domain-Specific Language for Highly Scalable Multigrid Solvers","authors":"Christian Schmitt, S. Kuckuk, Frank Hannig, H. Köstler, Jürgen Teich","doi":"10.1109/WOLFHPC.2014.11","DOIUrl":"https://doi.org/10.1109/WOLFHPC.2014.11","url":null,"abstract":"High-Performance Computing (HPC) systems are becoming increasingly parallel and heterogeneous. As a consequence, HPC applications, such as simulation software, need to be especially designed towards these systems to achieve optimal performance. This, in turn, leads to higher complexity, making software engineers and scientists require a deep knowledge of the hardware and its technologies. As a remedy, domain-specific languages (DSLs) are a convenient technology for domain experts to describe settings and problems they want to solve using terms and models familiar to them. This specification is transformed into a target language, i. e., source code in another programming language or a binary executable, by a specialized compiler. We propose ExaSlang, a language for the specification of numerical solvers based on the multigrid method targeting distributed-memory systems. Furthermore, we present the transformation framework that drives the corresponding source-to-source compiler. It emits C++ code utilizing a hybrid OpenMP and MPI parallelization. Moreover, we substantiate our approach with scaling results of our code scaling up to the complete JUQUEEN cluster, consisting of 28,672 nodes, with a total of 458,752 cores.","PeriodicalId":59014,"journal":{"name":"高性能计算技术","volume":"17 1","pages":"42-51"},"PeriodicalIF":0.0,"publicationDate":"2014-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81711717","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 59
Session details: 2 会话详情:2
Pub Date : 2011-11-13 DOI: 10.1145/3256284
S. Hammond
{"title":"Session details: 2","authors":"S. Hammond","doi":"10.1145/3256284","DOIUrl":"https://doi.org/10.1145/3256284","url":null,"abstract":"","PeriodicalId":59014,"journal":{"name":"高性能计算技术","volume":"318 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2011-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77192389","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
高性能计算技术
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1