首页 > 最新文献

Proceedings Third International Workshop on High-Level Parallel Programming Models and Supportive Environments最新文献

英文 中文
Constructive and adaptable distributed shared memory 具有建设性和适应性的分布式共享内存
J. Bataller, J. Bernabéu-Aubán
Distributed shared memory (DSM) is a paradigm for programming distributed systems, which provides an alternative to the message passing model. DSM offers the agents of the system a shared address space through which they can communicate with each other. The main problem of a DSM implementation on top of a message passing system is performance. Performance of an implementation is closely related to the consistency the DSM system offers: strong consistency (all agents agree about how memory events happen) and is more expensive to implement than weak consistency (disagreements are allowed). There have been many DSM systems proposals, each one supporting different consistency levels. Experience has shown that no one is well suited for the whole range of problems. In some cases, strong consistent primitives are not needed, while in other cases, the weak semantics provided are useless. This is also true for different implementations of the same memory model, since performance is also affected by the data access patterns of the applications. We introduce a novel DSM model called Mume. Mume is a low level layer close to the level of the message passing interface. The Mume interface provides only the minimum requirements to be considered as a shared memory system. The interface includes three types of synchronization primitives, namely total ordering, causal ordering and mutual exclusion. This allows efficient implementations of different memory access semantics, accommodating particular data access patterns.
分布式共享内存(DSM)是一种用于编程分布式系统的范例,它提供了消息传递模型的另一种选择。DSM为系统的代理提供了一个共享的地址空间,通过这个地址空间,它们可以相互通信。在消息传递系统之上实现DSM的主要问题是性能。实现的性能与DSM系统提供的一致性密切相关:强一致性(所有代理都同意内存事件如何发生)比弱一致性(允许存在分歧)的实现成本更高。有许多DSM系统建议,每一个都支持不同的一致性水平。经验表明,没有人能很好地应付所有的问题。在某些情况下,不需要强一致原语,而在其他情况下,提供的弱语义是无用的。对于同一内存模型的不同实现也是如此,因为性能还受到应用程序的数据访问模式的影响。我们介绍了一种新的DSM模型Mume。mime是一个较低的层,接近于消息传递接口的级别。Mume接口只提供了作为共享内存系统的最低要求。该接口包括三种类型的同步原语,即全排序、因果排序和互斥。这允许有效地实现不同的内存访问语义,以适应特定的数据访问模式。
{"title":"Constructive and adaptable distributed shared memory","authors":"J. Bataller, J. Bernabéu-Aubán","doi":"10.1109/HIPS.1998.665139","DOIUrl":"https://doi.org/10.1109/HIPS.1998.665139","url":null,"abstract":"Distributed shared memory (DSM) is a paradigm for programming distributed systems, which provides an alternative to the message passing model. DSM offers the agents of the system a shared address space through which they can communicate with each other. The main problem of a DSM implementation on top of a message passing system is performance. Performance of an implementation is closely related to the consistency the DSM system offers: strong consistency (all agents agree about how memory events happen) and is more expensive to implement than weak consistency (disagreements are allowed). There have been many DSM systems proposals, each one supporting different consistency levels. Experience has shown that no one is well suited for the whole range of problems. In some cases, strong consistent primitives are not needed, while in other cases, the weak semantics provided are useless. This is also true for different implementations of the same memory model, since performance is also affected by the data access patterns of the applications. We introduce a novel DSM model called Mume. Mume is a low level layer close to the level of the message passing interface. The Mume interface provides only the minimum requirements to be considered as a shared memory system. The interface includes three types of synchronization primitives, namely total ordering, causal ordering and mutual exclusion. This allows efficient implementations of different memory access semantics, accommodating particular data access patterns.","PeriodicalId":179985,"journal":{"name":"Proceedings Third International Workshop on High-Level Parallel Programming Models and Supportive Environments","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1998-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129070339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Implementing automatic coordination on networks of workstations 在工作站网络上实现自动协调
Christian Weiß, J. Knopp, H. Hellwagner
Distributed shared objects are a well known approach to achieve independence of the memory model for parallel programming. The illusion of shared (global) objects is a convenient abstraction which leads to ease of programming on both kinds of parallel architectures, shared memory and distributed memory machines. We present several different implementation variants for distributed shared objects on distributed platforms. We have considered these variants while implementing a high level parallel programming model known as coordinators (J. Knopp, 1996). These are global objects coordinating accesses to the encapsulated data according to statically defined access patterns. Coordinators have been implemented on both shared memory multiprocessors and networks of workstations (NOWs). We describe their implementation as distributed shared objects and give basic performance results on a NOW.
分布式共享对象是实现并行编程中内存模型独立性的一种众所周知的方法。共享(全局)对象的错觉是一种方便的抽象,它使得在两种并行架构(共享内存和分布式内存机器)上编程变得容易。我们为分布式平台上的分布式共享对象提供了几种不同的实现变体。我们在实现称为协调器的高级并行编程模型时考虑了这些变体(J. Knopp, 1996)。这些是全局对象,根据静态定义的访问模式协调对封装数据的访问。协调器已经在共享内存多处理器和工作站网络(NOWs)上实现。我们将它们的实现描述为分布式共享对象,并给出了在NOW上的基本性能结果。
{"title":"Implementing automatic coordination on networks of workstations","authors":"Christian Weiß, J. Knopp, H. Hellwagner","doi":"10.1109/HIPS.1998.665145","DOIUrl":"https://doi.org/10.1109/HIPS.1998.665145","url":null,"abstract":"Distributed shared objects are a well known approach to achieve independence of the memory model for parallel programming. The illusion of shared (global) objects is a convenient abstraction which leads to ease of programming on both kinds of parallel architectures, shared memory and distributed memory machines. We present several different implementation variants for distributed shared objects on distributed platforms. We have considered these variants while implementing a high level parallel programming model known as coordinators (J. Knopp, 1996). These are global objects coordinating accesses to the encapsulated data according to statically defined access patterns. Coordinators have been implemented on both shared memory multiprocessors and networks of workstations (NOWs). We describe their implementation as distributed shared objects and give basic performance results on a NOW.","PeriodicalId":179985,"journal":{"name":"Proceedings Third International Workshop on High-Level Parallel Programming Models and Supportive Environments","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1998-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126352828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A graph based framework for the definition of tools dealing with sparse and irregular distributed data structures 一个基于图的框架,用于定义处理稀疏和不规则分布数据结构的工具
J. Lépine, S. Chaumette, F. Rubi
Industrial applications use specific problem oriented implementations of large sparse and irregular data structures. Hence there is a need for tools that make it possible for developers to visualize their applications in terms of these data, their structure, and the operations that are applied to them, whatever their effective implementation and their distribution are. We present both a framework which we have setup to support the development of such tools, and prototypes which we have developed. The resulting environment is composed of two layers: the first layer is a model that we have defined and implemented as high level libraries that make it possible to efficiently abstract from the implementation; the second layer offers prototype tools built on top of these libraries. These tools are integrated within a graphical environment called Visit (T. Brandes et al., 1996) which is part of the HPFIT research effort (T. Brandes et al., 1996). HPFIT is a joint project involving three research laboratories: LIP in Lyon, France, LaBRI in Bordeaux, France, and GMD/SCAI in Bonn, Germany. Its aim is to provide an integrated HPF development environment that supports sparse and irregular data structures.
工业应用程序使用大型稀疏和不规则数据结构的特定面向问题的实现。因此,需要一些工具,使开发人员能够根据这些数据、它们的结构和应用于它们的操作来可视化他们的应用程序,无论它们的有效实现和分布是什么。我们提供了一个框架,我们已经建立了支持这些工具的开发,和原型,我们已经开发。最终的环境由两层组成:第一层是一个模型,我们已经将其定义和实现为高级库,可以有效地从实现中抽象出来;第二层提供了构建在这些库之上的原型工具。这些工具集成在一个名为Visit的图形环境中(T. Brandes et al., 1996),这是HPFIT研究工作的一部分(T. Brandes et al., 1996)。HPFIT是一个联合项目,涉及三个研究实验室:法国里昂的LIP,法国波尔多的LaBRI和德国波恩的GMD/SCAI。它的目标是提供一个支持稀疏和不规则数据结构的集成HPF开发环境。
{"title":"A graph based framework for the definition of tools dealing with sparse and irregular distributed data structures","authors":"J. Lépine, S. Chaumette, F. Rubi","doi":"10.1109/HIPS.1998.665144","DOIUrl":"https://doi.org/10.1109/HIPS.1998.665144","url":null,"abstract":"Industrial applications use specific problem oriented implementations of large sparse and irregular data structures. Hence there is a need for tools that make it possible for developers to visualize their applications in terms of these data, their structure, and the operations that are applied to them, whatever their effective implementation and their distribution are. We present both a framework which we have setup to support the development of such tools, and prototypes which we have developed. The resulting environment is composed of two layers: the first layer is a model that we have defined and implemented as high level libraries that make it possible to efficiently abstract from the implementation; the second layer offers prototype tools built on top of these libraries. These tools are integrated within a graphical environment called Visit (T. Brandes et al., 1996) which is part of the HPFIT research effort (T. Brandes et al., 1996). HPFIT is a joint project involving three research laboratories: LIP in Lyon, France, LaBRI in Bordeaux, France, and GMD/SCAI in Bonn, Germany. Its aim is to provide an integrated HPF development environment that supports sparse and irregular data structures.","PeriodicalId":179985,"journal":{"name":"Proceedings Third International Workshop on High-Level Parallel Programming Models and Supportive Environments","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1998-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131716160","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Language bindings for a data-parallel runtime 用于数据并行运行时的语言绑定
Bryan Carpenter, G. Fox, D. Leskiw, Xiaoming Li, Yuhong Wen, Guansong Zhang
The NPAC kernel runtime, developed in the PCRC (Parallel Compiler Runtime Consortium) project, is a runtime library with special support for the High Performance Fortran data model. It provides array descriptors for a generalized class of HPF like distributed arrays, support for parallel access to their elements, and a rich library of collective communication and arithmetic operations for manipulating these arrays. The library has been successfully used as a component in experimental HPF translation systems. With prospects for early appearance of fully featured, efficient HPF compilers looking questionable, we discuss a class of more easily implementable data parallel language extensions that preserve many of the attractive features of HPF, while providing the programmer with direct access to runtime libraries such as the NPAC PCRC kernel.
NPAC内核运行时由PCRC(并行编译运行时联盟)项目开发,是一个运行时库,特别支持高性能Fortran数据模型。它为广义的HPF类(如分布式数组)提供了数组描述符,支持对其元素的并行访问,以及用于操作这些数组的丰富的集体通信和算术运算库。该库已成功地用作实验高功率翻译系统的组成部分。由于功能完备、高效的HPF编译器早期出现的前景令人怀疑,我们讨论了一类更容易实现的数据并行语言扩展,它保留了HPF的许多吸引人的特性,同时为程序员提供了直接访问运行库(如NPAC PCRC内核)的功能。
{"title":"Language bindings for a data-parallel runtime","authors":"Bryan Carpenter, G. Fox, D. Leskiw, Xiaoming Li, Yuhong Wen, Guansong Zhang","doi":"10.1109/HIPS.1998.665142","DOIUrl":"https://doi.org/10.1109/HIPS.1998.665142","url":null,"abstract":"The NPAC kernel runtime, developed in the PCRC (Parallel Compiler Runtime Consortium) project, is a runtime library with special support for the High Performance Fortran data model. It provides array descriptors for a generalized class of HPF like distributed arrays, support for parallel access to their elements, and a rich library of collective communication and arithmetic operations for manipulating these arrays. The library has been successfully used as a component in experimental HPF translation systems. With prospects for early appearance of fully featured, efficient HPF compilers looking questionable, we discuss a class of more easily implementable data parallel language extensions that preserve many of the attractive features of HPF, while providing the programmer with direct access to runtime libraries such as the NPAC PCRC kernel.","PeriodicalId":179985,"journal":{"name":"Proceedings Third International Workshop on High-Level Parallel Programming Models and Supportive Environments","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1998-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126800330","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Improving performance of multi-dimensional array redistribution on distributed memory machines 改进分布式内存机器上多维数组重分配的性能
M. Guo, Yoshiyuki Yamashita, I. Nakata
Array redistribution is required very often in programs on distributed memory parallel computers. It is essential to use efficient algorithms for redistribution, otherwise the performance of the programs may degrade considerably. We focus on automatic generation of communication routines for multi dimensional redistribution. The principal advantage of this work is to gain the ability to handle redistribution between arbitrary source and destination processor sets and between arbitrary source and destination distribution schemes. We have implemented these algorithms using Parallelware communication library. Some optimization techniques for our algorithms are also proposed. Experimental results show the efficiency and flexibility of our techniques compared to other redistribution works.
在分布式内存并行计算机上的程序中,经常需要对数组进行重新分配。必须使用有效的算法进行再分配,否则程序的性能可能会大大降低。重点研究了多维再分配中通信例程的自动生成。这项工作的主要优点是获得了处理任意源和目标处理器集之间以及任意源和目标分布方案之间的再分发的能力。我们使用并行通信库实现了这些算法。本文还提出了一些算法的优化技术。实验结果表明,与其他再分配方法相比,我们的方法是有效的和灵活的。
{"title":"Improving performance of multi-dimensional array redistribution on distributed memory machines","authors":"M. Guo, Yoshiyuki Yamashita, I. Nakata","doi":"10.1109/HIPS.1998.665146","DOIUrl":"https://doi.org/10.1109/HIPS.1998.665146","url":null,"abstract":"Array redistribution is required very often in programs on distributed memory parallel computers. It is essential to use efficient algorithms for redistribution, otherwise the performance of the programs may degrade considerably. We focus on automatic generation of communication routines for multi dimensional redistribution. The principal advantage of this work is to gain the ability to handle redistribution between arbitrary source and destination processor sets and between arbitrary source and destination distribution schemes. We have implemented these algorithms using Parallelware communication library. Some optimization techniques for our algorithms are also proposed. Experimental results show the efficiency and flexibility of our techniques compared to other redistribution works.","PeriodicalId":179985,"journal":{"name":"Proceedings Third International Workshop on High-Level Parallel Programming Models and Supportive Environments","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1998-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116813626","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
ViC*: a compiler for virtual-memory C* ViC*:虚拟内存C*的编译器
A. Colvin, T. Cormen
The paper describes the functionality of ViC*, a compiler for a variant of the data parallel language C* with support for out-of-core data. The compiler translates C* programs with shapes declared out of core, which describe parallel data stored on disk. The compiler output is a SPMD style program in standard C with I/O and library calls added to efficiently access out-of-core parallel data. The ViC* compiler also applies several program transformations to improve out-of-core data access.
本文描述了ViC*的功能,ViC*是一种支持核外数据的数据并行语言C*的变体编译器。编译器翻译的C*程序具有外核声明的形状,它描述了存储在磁盘上的并行数据。编译器输出是标准C语言的SPMD风格程序,其中添加了I/O和库调用,以有效地访问核外并行数据。ViC*编译器还应用几个程序转换来改进核外数据访问。
{"title":"ViC*: a compiler for virtual-memory C*","authors":"A. Colvin, T. Cormen","doi":"10.1109/HIPS.1998.665140","DOIUrl":"https://doi.org/10.1109/HIPS.1998.665140","url":null,"abstract":"The paper describes the functionality of ViC*, a compiler for a variant of the data parallel language C* with support for out-of-core data. The compiler translates C* programs with shapes declared out of core, which describe parallel data stored on disk. The compiler output is a SPMD style program in standard C with I/O and library calls added to efficiently access out-of-core parallel data. The ViC* compiler also applies several program transformations to improve out-of-core data access.","PeriodicalId":179985,"journal":{"name":"Proceedings Third International Workshop on High-Level Parallel Programming Models and Supportive Environments","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1998-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122441271","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
Further results for improving loop interchange in non-adjacent and imperfectly nested loops 在非相邻和不完全嵌套循环中改进循环交换的进一步结果
Tsung-Chuan Huang, Cheng-Ming Yang
Loop interchange is a powerful restructuring technique for supporting vectorization and parallelization. We propose an improved technique to determine whether loops can be interchanged between two non adjacent loops. We also present a method for determining whether we can directly make loop interchange on an imperfectly nested loop. Some experimental results are also presented to show the effectiveness of the method.
循环交换是支持向量化和并行化的一种强大的重构技术。我们提出了一种改进的技术来确定环路是否可以在两个非相邻环路之间交换。我们还提出了一种方法来确定是否可以在不完美嵌套的循环上直接进行循环交换。实验结果表明了该方法的有效性。
{"title":"Further results for improving loop interchange in non-adjacent and imperfectly nested loops","authors":"Tsung-Chuan Huang, Cheng-Ming Yang","doi":"10.1109/HIPS.1998.665147","DOIUrl":"https://doi.org/10.1109/HIPS.1998.665147","url":null,"abstract":"Loop interchange is a powerful restructuring technique for supporting vectorization and parallelization. We propose an improved technique to determine whether loops can be interchanged between two non adjacent loops. We also present a method for determining whether we can directly make loop interchange on an imperfectly nested loop. Some experimental results are also presented to show the effectiveness of the method.","PeriodicalId":179985,"journal":{"name":"Proceedings Third International Workshop on High-Level Parallel Programming Models and Supportive Environments","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1998-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129434704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Making distributed shared memory simple, yet efficient
M. Swanson, L. Stoller, J. Carter
Recent research on distributed shared memory (DSM) has focussed on improving performance by reducing the communication overhead of DSM. Features added include lazy release consistency based coherence protocols and new interfaces that give programmers the ability to hand tune communication. These features have increased DSM performance at the expense of requiring increasingly complex DSM systems or increasingly cumbersome programming. They have also increased the computation overhead of DSM, which has partially offset the communication related performance gains. We chose to implement a simple DSM system, Quarks, with an eye towards hiding most computation overhead while using a very low latency transport layer to reduce the effect of communication overhead. The resulting performance is comparable to that of far more complex DSM systems, such as Treadmarks and Cashmere.
最近对分布式共享内存(DSM)的研究主要集中在通过减少DSM的通信开销来提高性能。新增的特性包括基于延迟发布一致性的一致性协议和新的接口,这些接口使程序员能够手动调整通信。这些特性提高了DSM的性能,但代价是需要越来越复杂的DSM系统或越来越麻烦的编程。它们还增加了DSM的计算开销,这部分抵消了与通信相关的性能增益。我们选择实现一个简单的DSM系统,Quarks,着眼于隐藏大部分计算开销,同时使用非常低延迟的传输层来减少通信开销的影响。由此产生的性能可与更复杂的DSM系统(如Treadmarks和Cashmere)相媲美。
{"title":"Making distributed shared memory simple, yet efficient","authors":"M. Swanson, L. Stoller, J. Carter","doi":"10.1109/HIPS.1998.665138","DOIUrl":"https://doi.org/10.1109/HIPS.1998.665138","url":null,"abstract":"Recent research on distributed shared memory (DSM) has focussed on improving performance by reducing the communication overhead of DSM. Features added include lazy release consistency based coherence protocols and new interfaces that give programmers the ability to hand tune communication. These features have increased DSM performance at the expense of requiring increasingly complex DSM systems or increasingly cumbersome programming. They have also increased the computation overhead of DSM, which has partially offset the communication related performance gains. We chose to implement a simple DSM system, Quarks, with an eye towards hiding most computation overhead while using a very low latency transport layer to reduce the effect of communication overhead. The resulting performance is comparable to that of far more complex DSM systems, such as Treadmarks and Cashmere.","PeriodicalId":179985,"journal":{"name":"Proceedings Third International Workshop on High-Level Parallel Programming Models and Supportive Environments","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1998-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129985224","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 53
Parallel and distributed programming with Pthreads and Rthreads 使用pthread和rthread进行并行和分布式编程
Bernd Dreier, Markus Zahn, T. Ungerer
The paper describes Rthreads (Remote threads), a software distributed shared memory system that supports sharing of global variables on clusters of computers with physically distributed memory. Other DSM systems either use virtual memory to implement coherence on networks of workstations or require programmers to adopt a special programming model. Rthreads uses primitives to read and write remote data and to synchronize remote accesses similar to the DSM systems that are based on special programming models. Unique aspects of Rthreads are: the primitives are syntactically and semantically closely related to the POSIX thread model (Pthreads). A precompiler automatically transforms Pthreads (source) programs into Rthreads (source) programs. After the transformation the programmer is still able to alter the Rthreads code for optimizing run time. Moreover Pthreads and Rthreads can be mixed within a single program. We support heterogeneous workstation clusters by implementing the Rthreads system on top of PVM, MPI and DCE. We demonstrate that programmer based optimizations may reach a significant performance increase. Our performance results show that the Rthreads system introduces few overhead compared to equivalent programs in the baseline system PVM, and a superior performance compared to the DSM systems, Adsmith and CVM.
Rthreads (Remote threads)是一种软件分布式共享内存系统,它支持在具有物理分布式内存的计算机集群上共享全局变量。其他DSM系统要么使用虚拟内存在工作站网络上实现一致性,要么要求程序员采用特殊的编程模型。Rthreads使用原语来读写远程数据,并同步远程访问,这与基于特殊编程模型的DSM系统类似。rthread的独特之处在于:原语在语法和语义上与POSIX线程模型(Pthreads)密切相关。预编译器会自动将Pthreads(源)程序转换为Rthreads(源)程序。转换之后,程序员仍然能够修改Rthreads代码以优化运行时。此外,pthread和rthread可以在单个程序中混合使用。我们通过在PVM、MPI和DCE之上实现Rthreads系统来支持异构工作站集群。我们证明了基于程序员的优化可以显著提高性能。我们的性能结果表明,与基线系统PVM中的等效程序相比,Rthreads系统引入的开销很少,并且与DSM系统、Adsmith和CVM相比具有优越的性能。
{"title":"Parallel and distributed programming with Pthreads and Rthreads","authors":"Bernd Dreier, Markus Zahn, T. Ungerer","doi":"10.1109/HIPS.1998.665141","DOIUrl":"https://doi.org/10.1109/HIPS.1998.665141","url":null,"abstract":"The paper describes Rthreads (Remote threads), a software distributed shared memory system that supports sharing of global variables on clusters of computers with physically distributed memory. Other DSM systems either use virtual memory to implement coherence on networks of workstations or require programmers to adopt a special programming model. Rthreads uses primitives to read and write remote data and to synchronize remote accesses similar to the DSM systems that are based on special programming models. Unique aspects of Rthreads are: the primitives are syntactically and semantically closely related to the POSIX thread model (Pthreads). A precompiler automatically transforms Pthreads (source) programs into Rthreads (source) programs. After the transformation the programmer is still able to alter the Rthreads code for optimizing run time. Moreover Pthreads and Rthreads can be mixed within a single program. We support heterogeneous workstation clusters by implementing the Rthreads system on top of PVM, MPI and DCE. We demonstrate that programmer based optimizations may reach a significant performance increase. Our performance results show that the Rthreads system introduces few overhead compared to equivalent programs in the baseline system PVM, and a superior performance compared to the DSM systems, Adsmith and CVM.","PeriodicalId":179985,"journal":{"name":"Proceedings Third International Workshop on High-Level Parallel Programming Models and Supportive Environments","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1998-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130407930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
ZPL's WYSIWYG performance model
B. Chamberlain, Sung-Eun Choi, E. Lewis, Calvin Lin, L. Snyder, Derrick Weathersby
ZPL is a parallel array language designed for high performance scientific and engineering computations. Unlike other parallel languages, ZPL is founded on a machine model (the CTA) that accurately abstracts contemporary MIMD parallel computers. This makes it possible to correlate ZPL programs with machine behavior. As a result, programmers can reason about how code will perform on a typical parallel machine and thereby make informed decisions between alternative programming solutions. The paper describes ZPL's performance model and its syntactic cues for conveying operation cost. The what you see is what you get (WYSIWYG) nature of ZPL operations is demonstrated on the IBM SP-2, Intel Paragon, SGI Power Challenge, and Cray T3E. Additionally, the model is used to evaluate two algorithms for matrix multiplication. Experiments show that the performance model correctly predicts the faster solution on all four platforms for a range of problem sizes.
ZPL是一种为高性能科学和工程计算而设计的并行数组语言。与其他并行语言不同,ZPL建立在机器模型(CTA)上,该模型精确地抽象了当代MIMD并行计算机。这使得将ZPL程序与机器行为联系起来成为可能。因此,程序员可以推断代码将如何在典型的并行机器上执行,从而在可选的编程解决方案之间做出明智的决策。本文描述了ZPL的性能模型及其用于输送运行成本的语法线索。所见即所得(所见即所得)的ZPL操作性质在IBM SP-2、Intel Paragon、SGI Power Challenge和Cray T3E上得到了演示。此外,该模型还用于评估两种矩阵乘法算法。实验表明,性能模型在所有四个平台上正确地预测了一系列问题大小的更快解决方案。
{"title":"ZPL's WYSIWYG performance model","authors":"B. Chamberlain, Sung-Eun Choi, E. Lewis, Calvin Lin, L. Snyder, Derrick Weathersby","doi":"10.1109/HIPS.1998.665143","DOIUrl":"https://doi.org/10.1109/HIPS.1998.665143","url":null,"abstract":"ZPL is a parallel array language designed for high performance scientific and engineering computations. Unlike other parallel languages, ZPL is founded on a machine model (the CTA) that accurately abstracts contemporary MIMD parallel computers. This makes it possible to correlate ZPL programs with machine behavior. As a result, programmers can reason about how code will perform on a typical parallel machine and thereby make informed decisions between alternative programming solutions. The paper describes ZPL's performance model and its syntactic cues for conveying operation cost. The what you see is what you get (WYSIWYG) nature of ZPL operations is demonstrated on the IBM SP-2, Intel Paragon, SGI Power Challenge, and Cray T3E. Additionally, the model is used to evaluate two algorithms for matrix multiplication. Experiments show that the performance model correctly predicts the faster solution on all four platforms for a range of problem sizes.","PeriodicalId":179985,"journal":{"name":"Proceedings Third International Workshop on High-Level Parallel Programming Models and Supportive Environments","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1998-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123439193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 42
期刊
Proceedings Third International Workshop on High-Level Parallel Programming Models and Supportive Environments
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1