{"title":"PGHPF from The Portland Group","authors":"V. Schuster","doi":"10.1109/M-PDT.1994.329807","DOIUrl":null,"url":null,"abstract":"PGHPF, The Portland Group’s HPF compiler, is now available for general distribution. Its initial release fully supports the HPF subset as defined in version 1 .O of the H P F Language Specification. A March 1995 release will support the full HPF language. PGHPF is available in two forms. A highly tuned version is integrated with PGI’s PGF77 Fortran compiler and produces executable images for most 8 6 0 and Sparc multiprocessor platforms. In this form, PGHPF will be the standard HPF compiler provided on the Intel Paragon and Meiko CS-2 scalable parallel processing systems. It will also be optimized for other 8 6 0 and SuperSparc sharedmemory multiprocessor systems. PGHPF is also available as a source-to-source translator that produces Fortran 77, incorporating calls to a portable communications library. This output, with linearized array references and de facto standard Cray pointer variable declarations, can then be used as input to standard node compilers. Both forms of the compiler use an internally defined transport-independent runtime library. This allows common source generation regardless of the target or the underlying communication mechanism (MPI, PVM, Parmacs, NX, or a targetcustom communication protocol). The runtime library for a specified target can thus be optimized outside the context of the compiler. PGI is developing optimized versions of the runtime library for the Intel Paragon, Meiko CS-2, SGI MP Challenge, SuperSparc workstation clusters, and Solaris shared-memory systems. Interfaces to PGHPF, including the runtime interface, will be open and freely available. This will let system vendors and researchers custom-tune for a specific target, and will facilitate integration with existing parallel support tools. The success of HPF as a standard depends on whether programmers can use it to implement efficient, portable versions of appropriate data-parallel applications. Based on that assumption, the highest priority for the initial release of PGHPF is completeness, correctness, and source portability. The initial release of PGHPF supports all of the HPF subset and will distribute and align data exactly as the programmer specifies, in as many dimensions as desired. Control parallelism will be exploited wherever possible as dictated by data distributions and language elements. PGI is spending significant effort to minimize the inefficiencies and overhead introduced to support the HPF paradigm. From a performance standpoint, minimization and efficiency of communication are most important. PGHPF incorporates optimizations that address both structured and unstructured communication. It can identify and exploit a program’s inherent structure through calls to structured asynchronous communication primitives. Examples of such primitives include collective shifts, the various forms of broadcast, and data reductions. Exploiting an application’s structure increases efficiency and performance portability. The asynchronous nature of the primitives allows overlap of communication with computation, and can reduce or eliminate the communication profile in some applications. In addition, this approach allows for many communication-specific optimizations, including common communication elimination, communication scheduling, communication vectorization, and reuse of scheduling information for unstructured communications. HPF programs that should perform well under PGHPF include those with explicit data distributions and alignments wellsuited for the target architecture, and those that liberally use FORALL, Fortran 9( array assignments, and the INDEPENDENT directive. Programs such as these let thc compiler optimize based on the parallelisrr expressed by the programmer. As the stati of the art in automatic data distributior and parallelization moves forward, carefu: coding will become less important. PGHPF exhibits good speedups or native HPF versions of the Shallow Watei benchmark (9 .3~) and a 3D Poisson Solvei (8.3x), as measured in wall time on a 15node Intel Paragon. Several complete applications have been run as well, including a 16,000-line fluid-flow applicatior (6x) and a 3,000-line elastic-wave simulation application ( 7 ~ ) . The performance 0: these tests on shared-memory SuperSparc systems shows similar scalability. PGI considers these efficiency numbers a gooc start, and is on a steep curve implementing target-independent optimizations thal should further increase efficiency. HPF allows a concise and portable specification of an application’s inherent daE parallelism. It is a valuable means by whick a programmer can convey to a compilei how best to optimize in the presence of i memory hierarchy. PGI expects that programmers of parallel systems are primarilj interested, a t least in the near term, ir accessing HPF compilers that use thir information to maximum advantage on i given target system. While there are manj applications similar to those outlinec above that can be efficiently implementec in the current definition of HPF, extensions are needed to address irregular datz distributions, parallel I/O, and explicit task parallelism.","PeriodicalId":325213,"journal":{"name":"IEEE Parallel & Distributed Technology: Systems & Applications","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Parallel & Distributed Technology: Systems & Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/M-PDT.1994.329807","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
PGHPF, The Portland Group’s HPF compiler, is now available for general distribution. Its initial release fully supports the HPF subset as defined in version 1 .O of the H P F Language Specification. A March 1995 release will support the full HPF language. PGHPF is available in two forms. A highly tuned version is integrated with PGI’s PGF77 Fortran compiler and produces executable images for most 8 6 0 and Sparc multiprocessor platforms. In this form, PGHPF will be the standard HPF compiler provided on the Intel Paragon and Meiko CS-2 scalable parallel processing systems. It will also be optimized for other 8 6 0 and SuperSparc sharedmemory multiprocessor systems. PGHPF is also available as a source-to-source translator that produces Fortran 77, incorporating calls to a portable communications library. This output, with linearized array references and de facto standard Cray pointer variable declarations, can then be used as input to standard node compilers. Both forms of the compiler use an internally defined transport-independent runtime library. This allows common source generation regardless of the target or the underlying communication mechanism (MPI, PVM, Parmacs, NX, or a targetcustom communication protocol). The runtime library for a specified target can thus be optimized outside the context of the compiler. PGI is developing optimized versions of the runtime library for the Intel Paragon, Meiko CS-2, SGI MP Challenge, SuperSparc workstation clusters, and Solaris shared-memory systems. Interfaces to PGHPF, including the runtime interface, will be open and freely available. This will let system vendors and researchers custom-tune for a specific target, and will facilitate integration with existing parallel support tools. The success of HPF as a standard depends on whether programmers can use it to implement efficient, portable versions of appropriate data-parallel applications. Based on that assumption, the highest priority for the initial release of PGHPF is completeness, correctness, and source portability. The initial release of PGHPF supports all of the HPF subset and will distribute and align data exactly as the programmer specifies, in as many dimensions as desired. Control parallelism will be exploited wherever possible as dictated by data distributions and language elements. PGI is spending significant effort to minimize the inefficiencies and overhead introduced to support the HPF paradigm. From a performance standpoint, minimization and efficiency of communication are most important. PGHPF incorporates optimizations that address both structured and unstructured communication. It can identify and exploit a program’s inherent structure through calls to structured asynchronous communication primitives. Examples of such primitives include collective shifts, the various forms of broadcast, and data reductions. Exploiting an application’s structure increases efficiency and performance portability. The asynchronous nature of the primitives allows overlap of communication with computation, and can reduce or eliminate the communication profile in some applications. In addition, this approach allows for many communication-specific optimizations, including common communication elimination, communication scheduling, communication vectorization, and reuse of scheduling information for unstructured communications. HPF programs that should perform well under PGHPF include those with explicit data distributions and alignments wellsuited for the target architecture, and those that liberally use FORALL, Fortran 9( array assignments, and the INDEPENDENT directive. Programs such as these let thc compiler optimize based on the parallelisrr expressed by the programmer. As the stati of the art in automatic data distributior and parallelization moves forward, carefu: coding will become less important. PGHPF exhibits good speedups or native HPF versions of the Shallow Watei benchmark (9 .3~) and a 3D Poisson Solvei (8.3x), as measured in wall time on a 15node Intel Paragon. Several complete applications have been run as well, including a 16,000-line fluid-flow applicatior (6x) and a 3,000-line elastic-wave simulation application ( 7 ~ ) . The performance 0: these tests on shared-memory SuperSparc systems shows similar scalability. PGI considers these efficiency numbers a gooc start, and is on a steep curve implementing target-independent optimizations thal should further increase efficiency. HPF allows a concise and portable specification of an application’s inherent daE parallelism. It is a valuable means by whick a programmer can convey to a compilei how best to optimize in the presence of i memory hierarchy. PGI expects that programmers of parallel systems are primarilj interested, a t least in the near term, ir accessing HPF compilers that use thir information to maximum advantage on i given target system. While there are manj applications similar to those outlinec above that can be efficiently implementec in the current definition of HPF, extensions are needed to address irregular datz distributions, parallel I/O, and explicit task parallelism.