PGHPF from The Portland Group

V. Schuster
{"title":"PGHPF from The Portland Group","authors":"V. Schuster","doi":"10.1109/M-PDT.1994.329807","DOIUrl":null,"url":null,"abstract":"PGHPF, The Portland Group’s HPF compiler, is now available for general distribution. Its initial release fully supports the HPF subset as defined in version 1 .O of the H P F Language Specification. A March 1995 release will support the full HPF language. PGHPF is available in two forms. A highly tuned version is integrated with PGI’s PGF77 Fortran compiler and produces executable images for most 8 6 0 and Sparc multiprocessor platforms. In this form, PGHPF will be the standard HPF compiler provided on the Intel Paragon and Meiko CS-2 scalable parallel processing systems. It will also be optimized for other 8 6 0 and SuperSparc sharedmemory multiprocessor systems. PGHPF is also available as a source-to-source translator that produces Fortran 77, incorporating calls to a portable communications library. This output, with linearized array references and de facto standard Cray pointer variable declarations, can then be used as input to standard node compilers. Both forms of the compiler use an internally defined transport-independent runtime library. This allows common source generation regardless of the target or the underlying communication mechanism (MPI, PVM, Parmacs, NX, or a targetcustom communication protocol). The runtime library for a specified target can thus be optimized outside the context of the compiler. PGI is developing optimized versions of the runtime library for the Intel Paragon, Meiko CS-2, SGI MP Challenge, SuperSparc workstation clusters, and Solaris shared-memory systems. Interfaces to PGHPF, including the runtime interface, will be open and freely available. This will let system vendors and researchers custom-tune for a specific target, and will facilitate integration with existing parallel support tools. The success of HPF as a standard depends on whether programmers can use it to implement efficient, portable versions of appropriate data-parallel applications. Based on that assumption, the highest priority for the initial release of PGHPF is completeness, correctness, and source portability. The initial release of PGHPF supports all of the HPF subset and will distribute and align data exactly as the programmer specifies, in as many dimensions as desired. Control parallelism will be exploited wherever possible as dictated by data distributions and language elements. PGI is spending significant effort to minimize the inefficiencies and overhead introduced to support the HPF paradigm. From a performance standpoint, minimization and efficiency of communication are most important. PGHPF incorporates optimizations that address both structured and unstructured communication. It can identify and exploit a program’s inherent structure through calls to structured asynchronous communication primitives. Examples of such primitives include collective shifts, the various forms of broadcast, and data reductions. Exploiting an application’s structure increases efficiency and performance portability. The asynchronous nature of the primitives allows overlap of communication with computation, and can reduce or eliminate the communication profile in some applications. In addition, this approach allows for many communication-specific optimizations, including common communication elimination, communication scheduling, communication vectorization, and reuse of scheduling information for unstructured communications. HPF programs that should perform well under PGHPF include those with explicit data distributions and alignments wellsuited for the target architecture, and those that liberally use FORALL, Fortran 9( array assignments, and the INDEPENDENT directive. Programs such as these let thc compiler optimize based on the parallelisrr expressed by the programmer. As the stati of the art in automatic data distributior and parallelization moves forward, carefu: coding will become less important. PGHPF exhibits good speedups or native HPF versions of the Shallow Watei benchmark (9 .3~) and a 3D Poisson Solvei (8.3x), as measured in wall time on a 15node Intel Paragon. Several complete applications have been run as well, including a 16,000-line fluid-flow applicatior (6x) and a 3,000-line elastic-wave simulation application ( 7 ~ ) . The performance 0: these tests on shared-memory SuperSparc systems shows similar scalability. PGI considers these efficiency numbers a gooc start, and is on a steep curve implementing target-independent optimizations thal should further increase efficiency. HPF allows a concise and portable specification of an application’s inherent daE parallelism. It is a valuable means by whick a programmer can convey to a compilei how best to optimize in the presence of i memory hierarchy. PGI expects that programmers of parallel systems are primarilj interested, a t least in the near term, ir accessing HPF compilers that use thir information to maximum advantage on i given target system. While there are manj applications similar to those outlinec above that can be efficiently implementec in the current definition of HPF, extensions are needed to address irregular datz distributions, parallel I/O, and explicit task parallelism.","PeriodicalId":325213,"journal":{"name":"IEEE Parallel & Distributed Technology: Systems & Applications","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Parallel & Distributed Technology: Systems & Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/M-PDT.1994.329807","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

PGHPF, The Portland Group’s HPF compiler, is now available for general distribution. Its initial release fully supports the HPF subset as defined in version 1 .O of the H P F Language Specification. A March 1995 release will support the full HPF language. PGHPF is available in two forms. A highly tuned version is integrated with PGI’s PGF77 Fortran compiler and produces executable images for most 8 6 0 and Sparc multiprocessor platforms. In this form, PGHPF will be the standard HPF compiler provided on the Intel Paragon and Meiko CS-2 scalable parallel processing systems. It will also be optimized for other 8 6 0 and SuperSparc sharedmemory multiprocessor systems. PGHPF is also available as a source-to-source translator that produces Fortran 77, incorporating calls to a portable communications library. This output, with linearized array references and de facto standard Cray pointer variable declarations, can then be used as input to standard node compilers. Both forms of the compiler use an internally defined transport-independent runtime library. This allows common source generation regardless of the target or the underlying communication mechanism (MPI, PVM, Parmacs, NX, or a targetcustom communication protocol). The runtime library for a specified target can thus be optimized outside the context of the compiler. PGI is developing optimized versions of the runtime library for the Intel Paragon, Meiko CS-2, SGI MP Challenge, SuperSparc workstation clusters, and Solaris shared-memory systems. Interfaces to PGHPF, including the runtime interface, will be open and freely available. This will let system vendors and researchers custom-tune for a specific target, and will facilitate integration with existing parallel support tools. The success of HPF as a standard depends on whether programmers can use it to implement efficient, portable versions of appropriate data-parallel applications. Based on that assumption, the highest priority for the initial release of PGHPF is completeness, correctness, and source portability. The initial release of PGHPF supports all of the HPF subset and will distribute and align data exactly as the programmer specifies, in as many dimensions as desired. Control parallelism will be exploited wherever possible as dictated by data distributions and language elements. PGI is spending significant effort to minimize the inefficiencies and overhead introduced to support the HPF paradigm. From a performance standpoint, minimization and efficiency of communication are most important. PGHPF incorporates optimizations that address both structured and unstructured communication. It can identify and exploit a program’s inherent structure through calls to structured asynchronous communication primitives. Examples of such primitives include collective shifts, the various forms of broadcast, and data reductions. Exploiting an application’s structure increases efficiency and performance portability. The asynchronous nature of the primitives allows overlap of communication with computation, and can reduce or eliminate the communication profile in some applications. In addition, this approach allows for many communication-specific optimizations, including common communication elimination, communication scheduling, communication vectorization, and reuse of scheduling information for unstructured communications. HPF programs that should perform well under PGHPF include those with explicit data distributions and alignments wellsuited for the target architecture, and those that liberally use FORALL, Fortran 9( array assignments, and the INDEPENDENT directive. Programs such as these let thc compiler optimize based on the parallelisrr expressed by the programmer. As the stati of the art in automatic data distributior and parallelization moves forward, carefu: coding will become less important. PGHPF exhibits good speedups or native HPF versions of the Shallow Watei benchmark (9 .3~) and a 3D Poisson Solvei (8.3x), as measured in wall time on a 15node Intel Paragon. Several complete applications have been run as well, including a 16,000-line fluid-flow applicatior (6x) and a 3,000-line elastic-wave simulation application ( 7 ~ ) . The performance 0: these tests on shared-memory SuperSparc systems shows similar scalability. PGI considers these efficiency numbers a gooc start, and is on a steep curve implementing target-independent optimizations thal should further increase efficiency. HPF allows a concise and portable specification of an application’s inherent daE parallelism. It is a valuable means by whick a programmer can convey to a compilei how best to optimize in the presence of i memory hierarchy. PGI expects that programmers of parallel systems are primarilj interested, a t least in the near term, ir accessing HPF compilers that use thir information to maximum advantage on i given target system. While there are manj applications similar to those outlinec above that can be efficiently implementec in the current definition of HPF, extensions are needed to address irregular datz distributions, parallel I/O, and explicit task parallelism.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
波特兰集团的PGHPF
PGHPF, Portland Group的HPF编译器,现在可以用于一般发行版。它的初始版本完全支持HPF语言规范1.0版中定义的HPF子集。1995年3月的版本将支持完整的HPF语言。PGHPF有两种形式。一个高度调优的版本与PGI的PGF77 Fortran编译器集成,并为大多数860和Sparc多处理器平台生成可执行映像。在这种形式下,PGHPF将成为Intel Paragon和Meiko CS-2可扩展并行处理系统上提供的标准HPF编译器。它还将针对其他860和SuperSparc共享内存多处理器系统进行优化。PGHPF还可以作为生成Fortran 77的源到源转换器,包含对便携式通信库的调用。该输出带有线性化的数组引用和事实上的标准Cray指针变量声明,然后可以用作标准节点编译器的输入。两种形式的编译器都使用内部定义的与传输无关的运行时库。这允许生成公共源代码,而不考虑目标或底层通信机制(MPI、PVM、Parmacs、NX或目标自定义通信协议)。因此,可以在编译器上下文之外对指定目标的运行时库进行优化。PGI正在为Intel Paragon、Meiko CS-2、SGI MP Challenge、SuperSparc工作站集群和Solaris共享内存系统开发优化版本的运行时库。PGHPF的接口,包括运行时接口,将是开放和免费提供的。这将允许系统供应商和研究人员针对特定目标进行定制调整,并将促进与现有并行支持工具的集成。HPF作为标准的成功取决于程序员是否可以使用它来实现适当的数据并行应用程序的高效、可移植版本。基于这一假设,PGHPF初始版本的最高优先级是完整性、正确性和源代码可移植性。PGHPF的初始版本支持所有HPF子集,并将按照程序员的要求在尽可能多的维度上精确地分发和对齐数据。控制并行性将根据数据分布和语言元素的要求尽可能加以利用。PGI正在花费大量精力来最小化为支持HPF范式而引入的低效率和开销。从性能的角度来看,沟通的最小化和效率是最重要的。PGHPF结合了处理结构化和非结构化通信的优化。它可以通过调用结构化异步通信原语来识别和利用程序的固有结构。这类原语的例子包括集体转换、各种形式的广播和数据缩减。利用应用程序的结构可以提高效率和性能可移植性。原语的异步特性允许通信与计算重叠,并且可以减少或消除某些应用程序中的通信配置文件。此外,这种方法允许许多特定于通信的优化,包括公共通信消除、通信调度、通信矢量化和非结构化通信调度信息的重用。应该在PGHPF下表现良好的HPF程序包括那些具有适合目标体系结构的显式数据分布和对齐的程序,以及那些自由使用FORALL、Fortran 9(数组赋值)和INDEPENDENT指令的程序。诸如此类的程序允许编译器基于程序员所表达的并行性进行优化。随着自动数据分布和并行化技术的发展,仔细的编码将变得不那么重要。在一台15节点的Intel Paragon上,PGHPF在Shallow Watei基准测试(9.3 ~)和3D Poisson Solvei(8.3倍)的原生HPF版本上表现出良好的加速效果。几个完整的应用程序也已经运行,包括16000线流体流动应用程序(6x)和3000线弹性波模拟应用程序(7 ~)。这些在共享内存SuperSparc系统上的测试显示了类似的可伸缩性。PGI认为这些效率数字是一个良好的开端,并且正在实现与目标无关的优化,这将进一步提高效率。HPF允许对应用程序固有的daE并行性进行简洁和可移植的规范。这是一种很有价值的方法,程序员可以通过它向编译器传达如何在存在内存层次结构的情况下进行最佳优化。PGI期望并行系统的程序员对访问在给定目标系统上最大限度地利用其信息的HPF编译器感兴趣,至少在短期内是这样。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A Unified Trace Environment for IBM SP systems Integrating personal computers in a distributed client-server environment Index, volume 4, 1996 Fault-tolerant computer system design Topics in advanced scientific computation
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1