Pub Date : 1998-03-30DOI: 10.1109/HIPS.1998.665139
J. Bataller, J. Bernabéu-Aubán
Distributed shared memory (DSM) is a paradigm for programming distributed systems, which provides an alternative to the message passing model. DSM offers the agents of the system a shared address space through which they can communicate with each other. The main problem of a DSM implementation on top of a message passing system is performance. Performance of an implementation is closely related to the consistency the DSM system offers: strong consistency (all agents agree about how memory events happen) and is more expensive to implement than weak consistency (disagreements are allowed). There have been many DSM systems proposals, each one supporting different consistency levels. Experience has shown that no one is well suited for the whole range of problems. In some cases, strong consistent primitives are not needed, while in other cases, the weak semantics provided are useless. This is also true for different implementations of the same memory model, since performance is also affected by the data access patterns of the applications. We introduce a novel DSM model called Mume. Mume is a low level layer close to the level of the message passing interface. The Mume interface provides only the minimum requirements to be considered as a shared memory system. The interface includes three types of synchronization primitives, namely total ordering, causal ordering and mutual exclusion. This allows efficient implementations of different memory access semantics, accommodating particular data access patterns.
{"title":"Constructive and adaptable distributed shared memory","authors":"J. Bataller, J. Bernabéu-Aubán","doi":"10.1109/HIPS.1998.665139","DOIUrl":"https://doi.org/10.1109/HIPS.1998.665139","url":null,"abstract":"Distributed shared memory (DSM) is a paradigm for programming distributed systems, which provides an alternative to the message passing model. DSM offers the agents of the system a shared address space through which they can communicate with each other. The main problem of a DSM implementation on top of a message passing system is performance. Performance of an implementation is closely related to the consistency the DSM system offers: strong consistency (all agents agree about how memory events happen) and is more expensive to implement than weak consistency (disagreements are allowed). There have been many DSM systems proposals, each one supporting different consistency levels. Experience has shown that no one is well suited for the whole range of problems. In some cases, strong consistent primitives are not needed, while in other cases, the weak semantics provided are useless. This is also true for different implementations of the same memory model, since performance is also affected by the data access patterns of the applications. We introduce a novel DSM model called Mume. Mume is a low level layer close to the level of the message passing interface. The Mume interface provides only the minimum requirements to be considered as a shared memory system. The interface includes three types of synchronization primitives, namely total ordering, causal ordering and mutual exclusion. This allows efficient implementations of different memory access semantics, accommodating particular data access patterns.","PeriodicalId":179985,"journal":{"name":"Proceedings Third International Workshop on High-Level Parallel Programming Models and Supportive Environments","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129070339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1998-03-30DOI: 10.1109/HIPS.1998.665145
Christian Weiß, J. Knopp, H. Hellwagner
Distributed shared objects are a well known approach to achieve independence of the memory model for parallel programming. The illusion of shared (global) objects is a convenient abstraction which leads to ease of programming on both kinds of parallel architectures, shared memory and distributed memory machines. We present several different implementation variants for distributed shared objects on distributed platforms. We have considered these variants while implementing a high level parallel programming model known as coordinators (J. Knopp, 1996). These are global objects coordinating accesses to the encapsulated data according to statically defined access patterns. Coordinators have been implemented on both shared memory multiprocessors and networks of workstations (NOWs). We describe their implementation as distributed shared objects and give basic performance results on a NOW.
{"title":"Implementing automatic coordination on networks of workstations","authors":"Christian Weiß, J. Knopp, H. Hellwagner","doi":"10.1109/HIPS.1998.665145","DOIUrl":"https://doi.org/10.1109/HIPS.1998.665145","url":null,"abstract":"Distributed shared objects are a well known approach to achieve independence of the memory model for parallel programming. The illusion of shared (global) objects is a convenient abstraction which leads to ease of programming on both kinds of parallel architectures, shared memory and distributed memory machines. We present several different implementation variants for distributed shared objects on distributed platforms. We have considered these variants while implementing a high level parallel programming model known as coordinators (J. Knopp, 1996). These are global objects coordinating accesses to the encapsulated data according to statically defined access patterns. Coordinators have been implemented on both shared memory multiprocessors and networks of workstations (NOWs). We describe their implementation as distributed shared objects and give basic performance results on a NOW.","PeriodicalId":179985,"journal":{"name":"Proceedings Third International Workshop on High-Level Parallel Programming Models and Supportive Environments","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126352828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1998-03-30DOI: 10.1109/HIPS.1998.665144
J. Lépine, S. Chaumette, F. Rubi
Industrial applications use specific problem oriented implementations of large sparse and irregular data structures. Hence there is a need for tools that make it possible for developers to visualize their applications in terms of these data, their structure, and the operations that are applied to them, whatever their effective implementation and their distribution are. We present both a framework which we have setup to support the development of such tools, and prototypes which we have developed. The resulting environment is composed of two layers: the first layer is a model that we have defined and implemented as high level libraries that make it possible to efficiently abstract from the implementation; the second layer offers prototype tools built on top of these libraries. These tools are integrated within a graphical environment called Visit (T. Brandes et al., 1996) which is part of the HPFIT research effort (T. Brandes et al., 1996). HPFIT is a joint project involving three research laboratories: LIP in Lyon, France, LaBRI in Bordeaux, France, and GMD/SCAI in Bonn, Germany. Its aim is to provide an integrated HPF development environment that supports sparse and irregular data structures.
工业应用程序使用大型稀疏和不规则数据结构的特定面向问题的实现。因此,需要一些工具,使开发人员能够根据这些数据、它们的结构和应用于它们的操作来可视化他们的应用程序,无论它们的有效实现和分布是什么。我们提供了一个框架,我们已经建立了支持这些工具的开发,和原型,我们已经开发。最终的环境由两层组成:第一层是一个模型,我们已经将其定义和实现为高级库,可以有效地从实现中抽象出来;第二层提供了构建在这些库之上的原型工具。这些工具集成在一个名为Visit的图形环境中(T. Brandes et al., 1996),这是HPFIT研究工作的一部分(T. Brandes et al., 1996)。HPFIT是一个联合项目,涉及三个研究实验室:法国里昂的LIP,法国波尔多的LaBRI和德国波恩的GMD/SCAI。它的目标是提供一个支持稀疏和不规则数据结构的集成HPF开发环境。
{"title":"A graph based framework for the definition of tools dealing with sparse and irregular distributed data structures","authors":"J. Lépine, S. Chaumette, F. Rubi","doi":"10.1109/HIPS.1998.665144","DOIUrl":"https://doi.org/10.1109/HIPS.1998.665144","url":null,"abstract":"Industrial applications use specific problem oriented implementations of large sparse and irregular data structures. Hence there is a need for tools that make it possible for developers to visualize their applications in terms of these data, their structure, and the operations that are applied to them, whatever their effective implementation and their distribution are. We present both a framework which we have setup to support the development of such tools, and prototypes which we have developed. The resulting environment is composed of two layers: the first layer is a model that we have defined and implemented as high level libraries that make it possible to efficiently abstract from the implementation; the second layer offers prototype tools built on top of these libraries. These tools are integrated within a graphical environment called Visit (T. Brandes et al., 1996) which is part of the HPFIT research effort (T. Brandes et al., 1996). HPFIT is a joint project involving three research laboratories: LIP in Lyon, France, LaBRI in Bordeaux, France, and GMD/SCAI in Bonn, Germany. Its aim is to provide an integrated HPF development environment that supports sparse and irregular data structures.","PeriodicalId":179985,"journal":{"name":"Proceedings Third International Workshop on High-Level Parallel Programming Models and Supportive Environments","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131716160","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1998-03-30DOI: 10.1109/HIPS.1998.665142
Bryan Carpenter, G. Fox, D. Leskiw, Xiaoming Li, Yuhong Wen, Guansong Zhang
The NPAC kernel runtime, developed in the PCRC (Parallel Compiler Runtime Consortium) project, is a runtime library with special support for the High Performance Fortran data model. It provides array descriptors for a generalized class of HPF like distributed arrays, support for parallel access to their elements, and a rich library of collective communication and arithmetic operations for manipulating these arrays. The library has been successfully used as a component in experimental HPF translation systems. With prospects for early appearance of fully featured, efficient HPF compilers looking questionable, we discuss a class of more easily implementable data parallel language extensions that preserve many of the attractive features of HPF, while providing the programmer with direct access to runtime libraries such as the NPAC PCRC kernel.
{"title":"Language bindings for a data-parallel runtime","authors":"Bryan Carpenter, G. Fox, D. Leskiw, Xiaoming Li, Yuhong Wen, Guansong Zhang","doi":"10.1109/HIPS.1998.665142","DOIUrl":"https://doi.org/10.1109/HIPS.1998.665142","url":null,"abstract":"The NPAC kernel runtime, developed in the PCRC (Parallel Compiler Runtime Consortium) project, is a runtime library with special support for the High Performance Fortran data model. It provides array descriptors for a generalized class of HPF like distributed arrays, support for parallel access to their elements, and a rich library of collective communication and arithmetic operations for manipulating these arrays. The library has been successfully used as a component in experimental HPF translation systems. With prospects for early appearance of fully featured, efficient HPF compilers looking questionable, we discuss a class of more easily implementable data parallel language extensions that preserve many of the attractive features of HPF, while providing the programmer with direct access to runtime libraries such as the NPAC PCRC kernel.","PeriodicalId":179985,"journal":{"name":"Proceedings Third International Workshop on High-Level Parallel Programming Models and Supportive Environments","volume":"188 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126800330","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1998-03-30DOI: 10.1109/HIPS.1998.665146
M. Guo, Yoshiyuki Yamashita, I. Nakata
Array redistribution is required very often in programs on distributed memory parallel computers. It is essential to use efficient algorithms for redistribution, otherwise the performance of the programs may degrade considerably. We focus on automatic generation of communication routines for multi dimensional redistribution. The principal advantage of this work is to gain the ability to handle redistribution between arbitrary source and destination processor sets and between arbitrary source and destination distribution schemes. We have implemented these algorithms using Parallelware communication library. Some optimization techniques for our algorithms are also proposed. Experimental results show the efficiency and flexibility of our techniques compared to other redistribution works.
{"title":"Improving performance of multi-dimensional array redistribution on distributed memory machines","authors":"M. Guo, Yoshiyuki Yamashita, I. Nakata","doi":"10.1109/HIPS.1998.665146","DOIUrl":"https://doi.org/10.1109/HIPS.1998.665146","url":null,"abstract":"Array redistribution is required very often in programs on distributed memory parallel computers. It is essential to use efficient algorithms for redistribution, otherwise the performance of the programs may degrade considerably. We focus on automatic generation of communication routines for multi dimensional redistribution. The principal advantage of this work is to gain the ability to handle redistribution between arbitrary source and destination processor sets and between arbitrary source and destination distribution schemes. We have implemented these algorithms using Parallelware communication library. Some optimization techniques for our algorithms are also proposed. Experimental results show the efficiency and flexibility of our techniques compared to other redistribution works.","PeriodicalId":179985,"journal":{"name":"Proceedings Third International Workshop on High-Level Parallel Programming Models and Supportive Environments","volume":"250 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116813626","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1998-03-30DOI: 10.1109/HIPS.1998.665140
A. Colvin, T. Cormen
The paper describes the functionality of ViC*, a compiler for a variant of the data parallel language C* with support for out-of-core data. The compiler translates C* programs with shapes declared out of core, which describe parallel data stored on disk. The compiler output is a SPMD style program in standard C with I/O and library calls added to efficiently access out-of-core parallel data. The ViC* compiler also applies several program transformations to improve out-of-core data access.
{"title":"ViC*: a compiler for virtual-memory C*","authors":"A. Colvin, T. Cormen","doi":"10.1109/HIPS.1998.665140","DOIUrl":"https://doi.org/10.1109/HIPS.1998.665140","url":null,"abstract":"The paper describes the functionality of ViC*, a compiler for a variant of the data parallel language C* with support for out-of-core data. The compiler translates C* programs with shapes declared out of core, which describe parallel data stored on disk. The compiler output is a SPMD style program in standard C with I/O and library calls added to efficiently access out-of-core parallel data. The ViC* compiler also applies several program transformations to improve out-of-core data access.","PeriodicalId":179985,"journal":{"name":"Proceedings Third International Workshop on High-Level Parallel Programming Models and Supportive Environments","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122441271","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1998-03-30DOI: 10.1109/HIPS.1998.665147
Tsung-Chuan Huang, Cheng-Ming Yang
Loop interchange is a powerful restructuring technique for supporting vectorization and parallelization. We propose an improved technique to determine whether loops can be interchanged between two non adjacent loops. We also present a method for determining whether we can directly make loop interchange on an imperfectly nested loop. Some experimental results are also presented to show the effectiveness of the method.
{"title":"Further results for improving loop interchange in non-adjacent and imperfectly nested loops","authors":"Tsung-Chuan Huang, Cheng-Ming Yang","doi":"10.1109/HIPS.1998.665147","DOIUrl":"https://doi.org/10.1109/HIPS.1998.665147","url":null,"abstract":"Loop interchange is a powerful restructuring technique for supporting vectorization and parallelization. We propose an improved technique to determine whether loops can be interchanged between two non adjacent loops. We also present a method for determining whether we can directly make loop interchange on an imperfectly nested loop. Some experimental results are also presented to show the effectiveness of the method.","PeriodicalId":179985,"journal":{"name":"Proceedings Third International Workshop on High-Level Parallel Programming Models and Supportive Environments","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129434704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1998-03-30DOI: 10.1109/HIPS.1998.665138
M. Swanson, L. Stoller, J. Carter
Recent research on distributed shared memory (DSM) has focussed on improving performance by reducing the communication overhead of DSM. Features added include lazy release consistency based coherence protocols and new interfaces that give programmers the ability to hand tune communication. These features have increased DSM performance at the expense of requiring increasingly complex DSM systems or increasingly cumbersome programming. They have also increased the computation overhead of DSM, which has partially offset the communication related performance gains. We chose to implement a simple DSM system, Quarks, with an eye towards hiding most computation overhead while using a very low latency transport layer to reduce the effect of communication overhead. The resulting performance is comparable to that of far more complex DSM systems, such as Treadmarks and Cashmere.
{"title":"Making distributed shared memory simple, yet efficient","authors":"M. Swanson, L. Stoller, J. Carter","doi":"10.1109/HIPS.1998.665138","DOIUrl":"https://doi.org/10.1109/HIPS.1998.665138","url":null,"abstract":"Recent research on distributed shared memory (DSM) has focussed on improving performance by reducing the communication overhead of DSM. Features added include lazy release consistency based coherence protocols and new interfaces that give programmers the ability to hand tune communication. These features have increased DSM performance at the expense of requiring increasingly complex DSM systems or increasingly cumbersome programming. They have also increased the computation overhead of DSM, which has partially offset the communication related performance gains. We chose to implement a simple DSM system, Quarks, with an eye towards hiding most computation overhead while using a very low latency transport layer to reduce the effect of communication overhead. The resulting performance is comparable to that of far more complex DSM systems, such as Treadmarks and Cashmere.","PeriodicalId":179985,"journal":{"name":"Proceedings Third International Workshop on High-Level Parallel Programming Models and Supportive Environments","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129985224","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1998-03-30DOI: 10.1109/HIPS.1998.665141
Bernd Dreier, Markus Zahn, T. Ungerer
The paper describes Rthreads (Remote threads), a software distributed shared memory system that supports sharing of global variables on clusters of computers with physically distributed memory. Other DSM systems either use virtual memory to implement coherence on networks of workstations or require programmers to adopt a special programming model. Rthreads uses primitives to read and write remote data and to synchronize remote accesses similar to the DSM systems that are based on special programming models. Unique aspects of Rthreads are: the primitives are syntactically and semantically closely related to the POSIX thread model (Pthreads). A precompiler automatically transforms Pthreads (source) programs into Rthreads (source) programs. After the transformation the programmer is still able to alter the Rthreads code for optimizing run time. Moreover Pthreads and Rthreads can be mixed within a single program. We support heterogeneous workstation clusters by implementing the Rthreads system on top of PVM, MPI and DCE. We demonstrate that programmer based optimizations may reach a significant performance increase. Our performance results show that the Rthreads system introduces few overhead compared to equivalent programs in the baseline system PVM, and a superior performance compared to the DSM systems, Adsmith and CVM.
{"title":"Parallel and distributed programming with Pthreads and Rthreads","authors":"Bernd Dreier, Markus Zahn, T. Ungerer","doi":"10.1109/HIPS.1998.665141","DOIUrl":"https://doi.org/10.1109/HIPS.1998.665141","url":null,"abstract":"The paper describes Rthreads (Remote threads), a software distributed shared memory system that supports sharing of global variables on clusters of computers with physically distributed memory. Other DSM systems either use virtual memory to implement coherence on networks of workstations or require programmers to adopt a special programming model. Rthreads uses primitives to read and write remote data and to synchronize remote accesses similar to the DSM systems that are based on special programming models. Unique aspects of Rthreads are: the primitives are syntactically and semantically closely related to the POSIX thread model (Pthreads). A precompiler automatically transforms Pthreads (source) programs into Rthreads (source) programs. After the transformation the programmer is still able to alter the Rthreads code for optimizing run time. Moreover Pthreads and Rthreads can be mixed within a single program. We support heterogeneous workstation clusters by implementing the Rthreads system on top of PVM, MPI and DCE. We demonstrate that programmer based optimizations may reach a significant performance increase. Our performance results show that the Rthreads system introduces few overhead compared to equivalent programs in the baseline system PVM, and a superior performance compared to the DSM systems, Adsmith and CVM.","PeriodicalId":179985,"journal":{"name":"Proceedings Third International Workshop on High-Level Parallel Programming Models and Supportive Environments","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130407930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1998-03-30DOI: 10.1109/HIPS.1998.665143
B. Chamberlain, Sung-Eun Choi, E. Lewis, Calvin Lin, L. Snyder, Derrick Weathersby
ZPL is a parallel array language designed for high performance scientific and engineering computations. Unlike other parallel languages, ZPL is founded on a machine model (the CTA) that accurately abstracts contemporary MIMD parallel computers. This makes it possible to correlate ZPL programs with machine behavior. As a result, programmers can reason about how code will perform on a typical parallel machine and thereby make informed decisions between alternative programming solutions. The paper describes ZPL's performance model and its syntactic cues for conveying operation cost. The what you see is what you get (WYSIWYG) nature of ZPL operations is demonstrated on the IBM SP-2, Intel Paragon, SGI Power Challenge, and Cray T3E. Additionally, the model is used to evaluate two algorithms for matrix multiplication. Experiments show that the performance model correctly predicts the faster solution on all four platforms for a range of problem sizes.
ZPL是一种为高性能科学和工程计算而设计的并行数组语言。与其他并行语言不同,ZPL建立在机器模型(CTA)上,该模型精确地抽象了当代MIMD并行计算机。这使得将ZPL程序与机器行为联系起来成为可能。因此,程序员可以推断代码将如何在典型的并行机器上执行,从而在可选的编程解决方案之间做出明智的决策。本文描述了ZPL的性能模型及其用于输送运行成本的语法线索。所见即所得(所见即所得)的ZPL操作性质在IBM SP-2、Intel Paragon、SGI Power Challenge和Cray T3E上得到了演示。此外,该模型还用于评估两种矩阵乘法算法。实验表明,性能模型在所有四个平台上正确地预测了一系列问题大小的更快解决方案。
{"title":"ZPL's WYSIWYG performance model","authors":"B. Chamberlain, Sung-Eun Choi, E. Lewis, Calvin Lin, L. Snyder, Derrick Weathersby","doi":"10.1109/HIPS.1998.665143","DOIUrl":"https://doi.org/10.1109/HIPS.1998.665143","url":null,"abstract":"ZPL is a parallel array language designed for high performance scientific and engineering computations. Unlike other parallel languages, ZPL is founded on a machine model (the CTA) that accurately abstracts contemporary MIMD parallel computers. This makes it possible to correlate ZPL programs with machine behavior. As a result, programmers can reason about how code will perform on a typical parallel machine and thereby make informed decisions between alternative programming solutions. The paper describes ZPL's performance model and its syntactic cues for conveying operation cost. The what you see is what you get (WYSIWYG) nature of ZPL operations is demonstrated on the IBM SP-2, Intel Paragon, SGI Power Challenge, and Cray T3E. Additionally, the model is used to evaluate two algorithms for matrix multiplication. Experiments show that the performance model correctly predicts the faster solution on all four platforms for a range of problem sizes.","PeriodicalId":179985,"journal":{"name":"Proceedings Third International Workshop on High-Level Parallel Programming Models and Supportive Environments","volume":"55 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123439193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}