首页 > 最新文献

2019 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC)最新文献

英文 中文
Performance Portability of a Wilson Dslash Stencil Operator Mini-App Using Kokkos and SYCL 使用Kokkos和SYCL的Wilson Dslash模板操作器小应用程序的性能可移植性
B. Joó, T. Kurth, M. A. Clark, Jeongnim Kim, C. Trott, Daniel Ibanez, Daniel Sunderland, J. Deslippe
We describe our experiences in creating mini-apps for the Wilson-Dslash stencil operator for Lattice Quantum Chromodynamics using the Kokkos and SYCL programming models. In particular we comment on the performance achieved on a variety of hardware architectures, limitations we have reached in both programming models and how these have been resolved by us, or may be resolved by the developers of these models.
我们描述了使用Kokkos和SYCL编程模型为晶格量子色动力学的Wilson-Dslash模板操作符创建迷你应用程序的经验。我们特别评论了在各种硬件架构上实现的性能,我们在两种编程模型中达到的限制,以及我们如何解决这些问题,或者这些模型的开发人员如何解决这些问题。
{"title":"Performance Portability of a Wilson Dslash Stencil Operator Mini-App Using Kokkos and SYCL","authors":"B. Joó, T. Kurth, M. A. Clark, Jeongnim Kim, C. Trott, Daniel Ibanez, Daniel Sunderland, J. Deslippe","doi":"10.1109/P3HPC49587.2019.00007","DOIUrl":"https://doi.org/10.1109/P3HPC49587.2019.00007","url":null,"abstract":"We describe our experiences in creating mini-apps for the Wilson-Dslash stencil operator for Lattice Quantum Chromodynamics using the Kokkos and SYCL programming models. In particular we comment on the performance achieved on a variety of hardware architectures, limitations we have reached in both programming models and how these have been resolved by us, or may be resolved by the developers of these models.","PeriodicalId":377385,"journal":{"name":"2019 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC)","volume":"90 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134478909","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
[Copyright notice] (版权)
{"title":"[Copyright notice]","authors":"","doi":"10.1109/p3hpc49587.2019.00002","DOIUrl":"https://doi.org/10.1109/p3hpc49587.2019.00002","url":null,"abstract":"","PeriodicalId":377385,"journal":{"name":"2019 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC)","volume":"97 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116650548","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On Applying Performance Portability Metrics 关于应用性能可移植性指标
D. Daniel, J. Panetta
As we prepare for further technological advance- ment in supercomputing, the diversity of hardware architec- tures and parallel programming languages has increased to new levels. At the same time, extracting performance from so many architectures is even more difficult. In this context, the appearance of portable languages capable of generating executable code for multiple architectures has become a recurrent research target. We port a set of seven parallel benchmarks from SPEC ACCEL suite and a wave propagation code to one such portable language: the Kokkos C++ programming library. Using the original OpenACC versions of the eight codes, we apply a known performance portability metric on the OpenACC and Kokkos versions of those codes across a variety of hardware platforms and problem sizes. We observe that the portability metric is sensitive to the problem size. To remedy this deficiency, we propose a novel metric for performance portability, apply the proposed metric to the eight codes and discuss the results.
随着我们为超级计算的进一步技术进步做准备,硬件架构和并行编程语言的多样性已经上升到新的水平。同时,从如此多的体系结构中提取性能更加困难。在这种背景下,能够为多种体系结构生成可执行代码的可移植语言的出现成为一个反复出现的研究目标。我们将SPEC ACCEL套件中的七个并行基准测试和一个波传播代码移植到一种这样的可移植语言:Kokkos c++编程库。使用这8个代码的原始OpenACC版本,我们对这些代码的OpenACC和Kokkos版本跨各种硬件平台和问题大小应用了已知的性能可移植性度量。我们观察到可移植性指标对问题的大小很敏感。为了弥补这一缺陷,我们提出了一种新的性能可移植性指标,将所提出的指标应用于八个代码并讨论了结果。
{"title":"On Applying Performance Portability Metrics","authors":"D. Daniel, J. Panetta","doi":"10.1109/P3HPC49587.2019.00010","DOIUrl":"https://doi.org/10.1109/P3HPC49587.2019.00010","url":null,"abstract":"As we prepare for further technological advance- ment in supercomputing, the diversity of hardware architec- tures and parallel programming languages has increased to new levels. At the same time, extracting performance from so many architectures is even more difficult. In this context, the appearance of portable languages capable of generating executable code for multiple architectures has become a recurrent research target. We port a set of seven parallel benchmarks from SPEC ACCEL suite and a wave propagation code to one such portable language: the Kokkos C++ programming library. Using the original OpenACC versions of the eight codes, we apply a known performance portability metric on the OpenACC and Kokkos versions of those codes across a variety of hardware platforms and problem sizes. We observe that the portability metric is sensitive to the problem size. To remedy this deficiency, we propose a novel metric for performance portability, apply the proposed metric to the eight codes and discuss the results.","PeriodicalId":377385,"journal":{"name":"2019 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114624544","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
RAJA: Portable Performance for Large-Scale Scientific Applications RAJA:大规模科学应用的便携性能
D. Beckingsale, T. Scogland, J. Burmark, R. Hornung, Holger E. Jones, W. Killian, A. Kunen, Olga Pearce, P. Robinson, B. Ryujin
Modern high-performance computing systems are diverse, with hardware designs ranging from homogeneous multi- core CPUs to GPU or FPGA accelerated systems. Achieving desir- able application performance often requires choosing a program- ming model best suited to a particular platform. For large codes used daily in production that are under continual development, architecture-specific ports are untenable. Maintainability re- quires single-source application code that is performance portable across a range of architectures and programming models. In this paper we describe RAJA, a portability layer that enables C++ applications to leverage various programming models, and thus architectures, with a single-source codebase. We describe preliminary results using RAJA in three large production codes at Lawrence Livermore National Laboratory, observing 17×, 13× and 12× speedup on GPU-only over CPU- only nodes with single-source application code in each case.
现代高性能计算系统是多种多样的,硬件设计范围从均匀的多核cpu到GPU或FPGA加速系统。实现理想的应用程序性能通常需要选择最适合特定平台的编程模型。对于在持续开发的生产环境中每天使用的大型代码,特定于体系结构的移植是站不住脚的。可维护性要求单源应用程序代码在各种体系结构和编程模型之间具有性能可移植性。在本文中,我们描述了RAJA,这是一个可移植性层,它使c++应用程序能够利用单一源代码库的各种编程模型和体系结构。我们描述了在劳伦斯利弗莫尔国家实验室的三个大型生产代码中使用RAJA的初步结果,在每种情况下,使用单源应用程序代码,仅gpu的节点比仅CPU的节点加速17倍,13倍和12倍。
{"title":"RAJA: Portable Performance for Large-Scale Scientific Applications","authors":"D. Beckingsale, T. Scogland, J. Burmark, R. Hornung, Holger E. Jones, W. Killian, A. Kunen, Olga Pearce, P. Robinson, B. Ryujin","doi":"10.1109/P3HPC49587.2019.00012","DOIUrl":"https://doi.org/10.1109/P3HPC49587.2019.00012","url":null,"abstract":"Modern high-performance computing systems are diverse, with hardware designs ranging from homogeneous multi- core CPUs to GPU or FPGA accelerated systems. Achieving desir- able application performance often requires choosing a program- ming model best suited to a particular platform. For large codes used daily in production that are under continual development, architecture-specific ports are untenable. Maintainability re- quires single-source application code that is performance portable across a range of architectures and programming models. In this paper we describe RAJA, a portability layer that enables C++ applications to leverage various programming models, and thus architectures, with a single-source codebase. We describe preliminary results using RAJA in three large production codes at Lawrence Livermore National Laboratory, observing 17×, 13× and 12× speedup on GPU-only over CPU- only nodes with single-source application code in each case.","PeriodicalId":377385,"journal":{"name":"2019 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC)","volume":"100 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124117044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 114
An Approach for Indirectly Adopting a Performance Portability Layer in Large Legacy Codes 在大型遗留代码中间接采用性能可移植性层的方法
John K. Holmen, B. Peterson, M. Berzins
Diversity among supported architectures in current and emerging high performance computing systems, including those for exascale, makes portable codebases desirable. Portabil- ity of a codebase can be improved using a performance portability layer to provide access to multiple underlying programming mod- els through a single interface. Direct adoption of a performance portability layer, however, poses challenges for large pre-existing software frameworks that may need to preserve legacy code and/or adopt other programming models in the future. This paper describes an approach for indirect adoption that introduces a framework-specific portability layer between the application developer and the adopted performance portability layer to help improve legacy code support and long-term portability for future architectures and programming models. This intermediate layer uses loop-level, application-level, and build-level components to ease adoption of a performance portability layer in large legacy codebases. Results are shown for two challenging case studies using this approach to make portable use of OpenMP and CUDA via Kokkos in an asynchronous many-task runtime system, Uintah. These results show performance improvements up to 2.7x when refactoring for portability and 2.6x when more efficiently using a node. Good strong-scaling to 442,368 threads across 1,728 Knights Landing processors are also shown using MPI+Kokkos at scale.
在当前和新兴的高性能计算系统(包括exascale系统)中支持的体系结构之间存在多样性,这使得可移植代码库成为一种需要。代码库的可移植性可以通过使用性能可移植性层来通过单个接口提供对多个底层编程模型的访问来提高。然而,直接采用性能可移植性层对大型预先存在的软件框架提出了挑战,这些框架可能需要保留遗留代码和/或在将来采用其他编程模型。本文描述了一种间接采用的方法,该方法在应用程序开发人员和所采用的性能可移植性层之间引入了一个特定于框架的可移植性层,以帮助改进遗留代码支持和未来架构和编程模型的长期可移植性。这个中间层使用循环级、应用程序级和构建级组件来简化在大型遗留代码库中采用性能可移植性层。结果显示了两个具有挑战性的案例研究,使用这种方法通过Kokkos在异步多任务运行时系统intah中移植使用OpenMP和CUDA。这些结果表明,在为可移植性而重构时,性能提高了2.7倍,在更有效地使用节点时,性能提高了2.6倍。在大规模使用MPI+Kokkos的情况下,还显示了在1,728个Knights Landing处理器上扩展到442,368个线程的良好强伸缩性。
{"title":"An Approach for Indirectly Adopting a Performance Portability Layer in Large Legacy Codes","authors":"John K. Holmen, B. Peterson, M. Berzins","doi":"10.1109/P3HPC49587.2019.00009","DOIUrl":"https://doi.org/10.1109/P3HPC49587.2019.00009","url":null,"abstract":"Diversity among supported architectures in current and emerging high performance computing systems, including those for exascale, makes portable codebases desirable. Portabil- ity of a codebase can be improved using a performance portability layer to provide access to multiple underlying programming mod- els through a single interface. Direct adoption of a performance portability layer, however, poses challenges for large pre-existing software frameworks that may need to preserve legacy code and/or adopt other programming models in the future. This paper describes an approach for indirect adoption that introduces a framework-specific portability layer between the application developer and the adopted performance portability layer to help improve legacy code support and long-term portability for future architectures and programming models. This intermediate layer uses loop-level, application-level, and build-level components to ease adoption of a performance portability layer in large legacy codebases. Results are shown for two challenging case studies using this approach to make portable use of OpenMP and CUDA via Kokkos in an asynchronous many-task runtime system, Uintah. These results show performance improvements up to 2.7x when refactoring for portability and 2.6x when more efficiently using a node. Good strong-scaling to 442,368 threads across 1,728 Knights Landing processors are also shown using MPI+Kokkos at scale.","PeriodicalId":377385,"journal":{"name":"2019 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125553885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Performance Portability of Multi-Material Kernels 多材料内核的性能可移植性
I. Reguly
Trying to improve performance, portability, and productivity of an application presents non-trivial trade-offs, which are often difficult to quantify. Recent work has developed metrics for performance portability, as well some aspects of productivity - in this case study, we present a set of challeng- ing computational kernels and their implementations from the domain of multi-material simulations, and evaluate them using these metrics. Three key kernels are implemented using OpenMP, OpenMP offload, OpenACC, CUDA, SYCL, and KOKKOS, and tested on ARM ThunderX2, IBM Power 9, Intel KNL, Broadwell, and Skylake CPUs, as well as NVIDIA P100 and V100 GPUs. We also consider the choice of compilers, evaluating LLVM/Clang, GCC, PGI, Intel, IBM XL, and Cray compilers, where available. We present a detailed performance analysis, calculate performance portability and code divergence metrics, contrasting performance, portability, and productivity.
试图提高应用程序的性能、可移植性和生产力会带来一些重要的权衡,而这些权衡通常很难量化。最近的工作已经开发了性能可移植性的指标,以及生产力的某些方面-在本案例研究中,我们提出了一组具有挑战性的计算内核及其来自多材料模拟领域的实现,并使用这些指标对它们进行评估。使用OpenMP, OpenMP卸载,OpenACC, CUDA, SYCL和KOKKOS实现了三个关键内核,并在ARM ThunderX2, IBM Power 9, Intel KNL, Broadwell和Skylake cpu以及NVIDIA P100和V100 gpu上进行了测试。我们还考虑了编译器的选择,评估了LLVM/Clang、GCC、PGI、Intel、IBM XL和Cray编译器(如果可用)。我们提供了详细的性能分析,计算性能可移植性和代码发散度量,对比性能、可移植性和生产力。
{"title":"Performance Portability of Multi-Material Kernels","authors":"I. Reguly","doi":"10.1109/P3HPC49587.2019.00008","DOIUrl":"https://doi.org/10.1109/P3HPC49587.2019.00008","url":null,"abstract":"Trying to improve performance, portability, and productivity of an application presents non-trivial trade-offs, which are often difficult to quantify. Recent work has developed metrics for performance portability, as well some aspects of productivity - in this case study, we present a set of challeng- ing computational kernels and their implementations from the domain of multi-material simulations, and evaluate them using these metrics. Three key kernels are implemented using OpenMP, OpenMP offload, OpenACC, CUDA, SYCL, and KOKKOS, and tested on ARM ThunderX2, IBM Power 9, Intel KNL, Broadwell, and Skylake CPUs, as well as NVIDIA P100 and V100 GPUs. We also consider the choice of compilers, evaluating LLVM/Clang, GCC, PGI, Intel, IBM XL, and Cray compilers, where available. We present a detailed performance analysis, calculate performance portability and code divergence metrics, contrasting performance, portability, and productivity.","PeriodicalId":377385,"journal":{"name":"2019 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116737442","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
mdspan in C++: A Case Study in the Integration of Performance Portable Features into International Language Standards c++中的mdspan:将性能可移植特性集成到国际语言标准中的案例研究
D. Hollman, B. Lelbach, H. Edwards, M. Hoemmen, Daniel Sunderland, C. Trott
Multi-dimensional arrays are ubiquitous in high-performance computing (HPC), but their absence from the C++ language standard is a long-standing and well-known limitation of their use for HPC. This paper describes the design and implementation of mdspan, a proposed C++ standard multidimensional array view (planned for inclusion in C++23). The proposal is largely inspired by work done in the Kokkos project— a C++ performance-portable programming model de- ployed by numerous HPC institutions to prepare their code base for exascale-class supercomputing systems. This paper describes the final design of mdspan af- ter a five-year process to achieve consensus in the C++ community. In particular, we will lay out how the design addresses some of the core challenges of performance-portable programming, and how its cus- tomization points allow a seamless extension into areas not currently addressed by the C++ Standard but which are of critical importance in the heterogeneous computing world of today’s systems. Finally, we have provided a production-quality implementation of the proposal in its current form. This work includes several benchmarks of this implementation aimed at demon- strating the zero-overhead nature of the modern design.
多维数组在高性能计算(HPC)中无处不在,但是在c++语言标准中缺少多维数组是其用于HPC的一个长期且众所周知的限制。本文描述了mdspan的设计和实现,mdspan是一个被提议的c++标准多维数组视图(计划在c++ 23中包含)。这个提议很大程度上受到Kokkos项目的启发,Kokkos是一个c++性能可移植的编程模型,由许多HPC机构部署,为百亿亿级超级计算系统准备代码库。本文描述了mdspan的最终设计,经过五年的过程,在c++社区中达成了共识。特别地,我们将展示该设计如何解决性能可移植编程的一些核心挑战,以及它的定制点如何允许无缝扩展到c++标准目前没有解决的领域,但这些领域在当今系统的异构计算世界中至关重要。最后,我们以当前形式提供了该建议的生产质量实现。这项工作包括该实现的几个基准,旨在展示现代设计的零开销性质。
{"title":"mdspan in C++: A Case Study in the Integration of Performance Portable Features into International Language Standards","authors":"D. Hollman, B. Lelbach, H. Edwards, M. Hoemmen, Daniel Sunderland, C. Trott","doi":"10.1109/P3HPC49587.2019.00011","DOIUrl":"https://doi.org/10.1109/P3HPC49587.2019.00011","url":null,"abstract":"Multi-dimensional arrays are ubiquitous in high-performance computing (HPC), but their absence from the C++ language standard is a long-standing and well-known limitation of their use for HPC. This paper describes the design and implementation of mdspan, a proposed C++ standard multidimensional array view (planned for inclusion in C++23). The proposal is largely inspired by work done in the Kokkos project— a C++ performance-portable programming model de- ployed by numerous HPC institutions to prepare their code base for exascale-class supercomputing systems. This paper describes the final design of mdspan af- ter a five-year process to achieve consensus in the C++ community. In particular, we will lay out how the design addresses some of the core challenges of performance-portable programming, and how its cus- tomization points allow a seamless extension into areas not currently addressed by the C++ Standard but which are of critical importance in the heterogeneous computing world of today’s systems. Finally, we have provided a production-quality implementation of the proposal in its current form. This work includes several benchmarks of this implementation aimed at demon- strating the zero-overhead nature of the modern design.","PeriodicalId":377385,"journal":{"name":"2019 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC)","volume":"666 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123050866","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Performance Portability across Diverse Computer Architectures 跨不同计算机体系结构的性能可移植性
Tom Deakin, Simon McIntosh-Smith, J. Price, Andrei Poenaru, Patrick Atkinson, Codrin Popa, Justin Salmon
Previous studies into performance portability have typically analysed a single application (and its various imple- mentations) in isolation. In this study we explore the wider landscape of performance portability by considering a number of applications from across the space of dwarfs, written in multiple parallel programming models, and across a diverse set of architectures. We apply rigorous performance portability metrics, as defined by Pennycook et al [1]. We believe this is the broadest and most rigorous performance portability study to date, representing a far reaching exploration of the state of performance portability that is achievable today. We will present a summary of the performance portability of each application and programming model across our diverge range of twelve computer architectures, including six different server CPUs from five different vendors, five different GPUs from two different vendors, and one vector architecture. We will conclude with an analysis of the performance portability of key programming models in general, across different application spaces as well across differing architectures, allowing us to comment on more general performance portability principles.
以前对性能可移植性的研究通常是孤立地分析单个应用程序(及其各种实现)。在本研究中,我们将通过考虑多个应用程序来探索性能可移植性的更广阔的前景,这些应用程序使用多个并行编程模型编写,并跨越不同的体系结构集。我们采用严格的性能可移植性指标,正如Pennycook等人所定义的那样。我们相信这是迄今为止最广泛和最严格的性能可移植性研究,代表了对当今可实现的性能可移植性状态的深远探索。我们将在十二种不同的计算机体系结构中展示每个应用程序和编程模型的性能可移植性的摘要,包括来自五个不同供应商的六个不同的服务器cpu,来自两个不同供应商的五个不同的gpu,以及一个矢量体系结构。最后,我们将分析跨不同应用程序空间和不同体系结构的关键编程模型的性能可移植性,从而对更通用的性能可移植性原则进行评论。
{"title":"Performance Portability across Diverse Computer Architectures","authors":"Tom Deakin, Simon McIntosh-Smith, J. Price, Andrei Poenaru, Patrick Atkinson, Codrin Popa, Justin Salmon","doi":"10.1109/P3HPC49587.2019.00006","DOIUrl":"https://doi.org/10.1109/P3HPC49587.2019.00006","url":null,"abstract":"Previous studies into performance portability have typically analysed a single application (and its various imple- mentations) in isolation. In this study we explore the wider landscape of performance portability by considering a number of applications from across the space of dwarfs, written in multiple parallel programming models, and across a diverse set of architectures. We apply rigorous performance portability metrics, as defined by Pennycook et al [1]. We believe this is the broadest and most rigorous performance portability study to date, representing a far reaching exploration of the state of performance portability that is achievable today. We will present a summary of the performance portability of each application and programming model across our diverge range of twelve computer architectures, including six different server CPUs from five different vendors, five different GPUs from two different vendors, and one vector architecture. We will conclude with an analysis of the performance portability of key programming models in general, across different application spaces as well across differing architectures, allowing us to comment on more general performance portability principles.","PeriodicalId":377385,"journal":{"name":"2019 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127084489","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 37
ClangJIT: Enhancing C++ with Just-in-Time Compilation ClangJIT:用即时编译增强c++
H. Finkel, David Poliakoff, D. Richards
The C++ programming language is not only a keystone of the high-performance-computing ecosystem but has proven to be a successful base for portable parallel-programming frameworks. As is well known, C++ programmers use templates to specialize algorithms, thus allowing the compiler to generate highly-efficient code for specific parameters, data structures, and so on. This capability has been limited to those specializations that can be identified when the application is compiled, and in many critical cases, compiling all potentially-relevant specializations is not practical. ClangJIT provides a well-integrated C++ language extension allowing template-based specialization to occur during program execution. This capability has been implemented for use in large-scale applications, and we demonstrate that just-in-time- compilation-based dynamic specialization can be integrated into applications, often requiring minimal changes (or no changes) to the applications themselves, providing significant performance improvements, programmer-productivity improvements, and de- creased compilation time.
c++编程语言不仅是高性能计算生态系统的基石,而且已被证明是可移植并行编程框架的成功基础。众所周知,c++程序员使用模板专门化算法,从而允许编译器为特定的参数、数据结构等生成高效的代码。此功能仅限于在编译应用程序时可以识别的那些专门化,并且在许多关键情况下,编译所有潜在相关的专门化是不切实际的。ClangJIT提供了一个集成良好的c++语言扩展,允许在程序执行期间进行基于模板的专门化。这个功能已经被实现用于大规模的应用程序,并且我们证明了基于即时编译的动态专门化可以集成到应用程序中,通常只需要对应用程序本身进行最小的更改(或者不需要更改),从而提供了显著的性能改进、程序员生产力的提高和编译时间的减少。
{"title":"ClangJIT: Enhancing C++ with Just-in-Time Compilation","authors":"H. Finkel, David Poliakoff, D. Richards","doi":"10.1109/P3HPC49587.2019.00013","DOIUrl":"https://doi.org/10.1109/P3HPC49587.2019.00013","url":null,"abstract":"The C++ programming language is not only a keystone of the high-performance-computing ecosystem but has proven to be a successful base for portable parallel-programming frameworks. As is well known, C++ programmers use templates to specialize algorithms, thus allowing the compiler to generate highly-efficient code for specific parameters, data structures, and so on. This capability has been limited to those specializations that can be identified when the application is compiled, and in many critical cases, compiling all potentially-relevant specializations is not practical. ClangJIT provides a well-integrated C++ language extension allowing template-based specialization to occur during program execution. This capability has been implemented for use in large-scale applications, and we demonstrate that just-in-time- compilation-based dynamic specialization can be integrated into applications, often requiring minimal changes (or no changes) to the applications themselves, providing significant performance improvements, programmer-productivity improvements, and de- creased compilation time.","PeriodicalId":377385,"journal":{"name":"2019 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC)","volume":"179 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122212825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Organization 组织
Simon Bell
The post- Cartesian ‘material turn’ in management and organization studies understands that bodies are far more than vehicles that enable work to be undertaken, but are agentive actors in the constitution of work and working selves. This leads to the need for more empirically-derived understanding of the agency of flesh in the performative corporealization of working, embodied selves. We met this challenge through adapting feminist, posthuman research methods for a study of the materialities and materialization of working bodies . The study takes forward Judith Butler’s and Karen Barad’s theories of performativity by reading them through each other, and introducing flesh as an agentive actor in each moment-to-moment move. In paying close attention to the speech of supposedly ‘dumb flesh’ we show how flesh resists its negation and itself imposes control on the worker. We coin the term ‘body/flesh’ and illuminate how bodies are active and agentive, constituting corporeal/izing working selves in somewhat unexpected ways.
管理和组织研究中的后笛卡尔“物质转向”理解,身体远不止是使工作得以进行的工具,而是构成工作和工作自我的代理行动者。这就导致了对肉体在工作的、具体化的自我的表演性肉体实现中的作用需要更多的经验性的理解。为了应对这一挑战,我们采用了女权主义、后人类的研究方法来研究工作机构的物质性和物质化。该研究通过相互解读朱迪思·巴特勒和凯伦·巴拉德的表演理论,并在每个时刻的动作中引入肉体作为代理演员,从而推进了他们的表演理论。在密切关注所谓的“哑巴肉体”的讲话时,我们展示了肉体是如何抵制它的否定和它自己对工人施加控制的。我们创造了“身体/肉体”这个术语,并阐明了身体是如何活跃和能动的,以某种意想不到的方式构成了身体/化的工作自我。
{"title":"Organization","authors":"Simon Bell","doi":"10.5040/9781350046436.ch-005","DOIUrl":"https://doi.org/10.5040/9781350046436.ch-005","url":null,"abstract":"The post- Cartesian ‘material turn’ in management and organization studies understands that bodies are far more than vehicles that enable work to be undertaken, but are agentive actors in the constitution of work and working selves. This leads to the need for more empirically-derived understanding of the agency of flesh in the performative corporealization of working, embodied selves. We met this challenge through adapting feminist, posthuman research methods for a study of the materialities and materialization of working bodies . The study takes forward Judith Butler’s and Karen Barad’s theories of performativity by reading them through each other, and introducing flesh as an agentive actor in each moment-to-moment move. In paying close attention to the speech of supposedly ‘dumb flesh’ we show how flesh resists its negation and itself imposes control on the worker. We coin the term ‘body/flesh’ and illuminate how bodies are active and agentive, constituting corporeal/izing working selves in somewhat unexpected ways.","PeriodicalId":377385,"journal":{"name":"2019 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127985662","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2019 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1