首页 > 最新文献

[1992] Proceedings of the Fourth IEEE Symposium on Parallel and Distributed Processing最新文献

英文 中文
A high-level, object-oriented approach to divider-and-conquer 一种高级的、面向对象的分治方法
A. Piper, R. Prager
An object-oriented framework for the divide-and-conquer (D&C) paradigm is presented. The framework enables a D&C representation of a problem to be built up for subsequent evaluation. This evaluation can be delayed until the maximum amount of computation that can be performed in one D&C pass has been integrated into the representation. This framework does not require a parallelizing compiler and therefore provides an environment that is flexible and easily extensible. D&C thus provides a structure suitable for parallel implementation and object-oriented programming techniques provide a means to encapsulate the D&C semantics and provide a uniform interface to the end-user. Results are presented for an implementation of the back-propagation algorithm.<>
提出了一个面向对象的分而治之(D&C)范式框架。该框架使问题的D&C表示能够为后续评估建立起来。这个计算可以延迟,直到在一次D&C传递中可以执行的最大计算量被集成到表示中。该框架不需要并行编译器,因此提供了一个灵活且易于扩展的环境。因此,D&C提供了一种适合并行实现的结构,面向对象编程技术提供了一种封装D&C语义的方法,并为最终用户提供了统一的接口。给出了一种反向传播算法的实现结果。
{"title":"A high-level, object-oriented approach to divider-and-conquer","authors":"A. Piper, R. Prager","doi":"10.1109/SPDP.1992.242730","DOIUrl":"https://doi.org/10.1109/SPDP.1992.242730","url":null,"abstract":"An object-oriented framework for the divide-and-conquer (D&C) paradigm is presented. The framework enables a D&C representation of a problem to be built up for subsequent evaluation. This evaluation can be delayed until the maximum amount of computation that can be performed in one D&C pass has been integrated into the representation. This framework does not require a parallelizing compiler and therefore provides an environment that is flexible and easily extensible. D&C thus provides a structure suitable for parallel implementation and object-oriented programming techniques provide a means to encapsulate the D&C semantics and provide a uniform interface to the end-user. Results are presented for an implementation of the back-propagation algorithm.<<ETX>>","PeriodicalId":265469,"journal":{"name":"[1992] Proceedings of the Fourth IEEE Symposium on Parallel and Distributed Processing","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129591971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Hierarchical interconnection networks: routing performance in the presence of faults 分层互连网络:存在故障时的路由性能
Beomsu Kim, H. Youn, K. Kavi
The authors develop a Markov model which can effectively estimate the successful routing probability and internode distance of hierarchical interconnection networks in the presence of faults. A BH/BH (binary hypercube/binary hypercube) network is tested using the model and verified by computer simulations. Comparisons with computer simulation reveal that the proposed model is very accurate. The network performance, when all nodes generate messages, is also expected to be effectively evaluated with the model.<>
建立了一种马尔可夫模型,该模型可以有效地估计分层互连网络在存在故障情况下的路由成功概率和节点间距离。利用该模型对BH/BH(二进制超立方体/二进制超立方体)网络进行了测试,并通过计算机仿真进行了验证。与计算机仿真结果的比较表明,该模型具有较高的精度。当所有节点都生成消息时,也期望用该模型有效地评估网络性能。
{"title":"Hierarchical interconnection networks: routing performance in the presence of faults","authors":"Beomsu Kim, H. Youn, K. Kavi","doi":"10.1109/SPDP.1992.242749","DOIUrl":"https://doi.org/10.1109/SPDP.1992.242749","url":null,"abstract":"The authors develop a Markov model which can effectively estimate the successful routing probability and internode distance of hierarchical interconnection networks in the presence of faults. A BH/BH (binary hypercube/binary hypercube) network is tested using the model and verified by computer simulations. Comparisons with computer simulation reveal that the proposed model is very accurate. The network performance, when all nodes generate messages, is also expected to be effectively evaluated with the model.<<ETX>>","PeriodicalId":265469,"journal":{"name":"[1992] Proceedings of the Fourth IEEE Symposium on Parallel and Distributed Processing","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129812369","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Programming environment for phase-reconfigurable parallel programming on SuperNode 基于SuperNode的相位可重构并行编程环境
J. Adamo, L. Trejo
The authors present a programming environment called C-NET developed for the reconfigurable SuperNode multiprocessor. It allows the implementation of variable-topology programs that are referred to as phase-reconfigurable programs. The design decisions concerning dynamic-reconfiguration handling are discussed with regard to the architectural constraints of the machine. It provides three specialized languages: PPL (phase programming language), for the development of phase-reconfigurable programs: GCL (graph-construction language), for the construction of graphs on which the phases are to be executed; and CPL components programming language), for coding the software components. The first example on which the programming environment was tested was the conjugate-gradient (CG) algorithm. The results are encouraging. Phase-reconfigurable implementation of CG was developed and compared with a fixed topology implementation (8*4 torus).<>
作者提出了一种为可重构SuperNode多处理器开发的编程环境C-NET。它允许实现可变拓扑程序,称为相位可重构程序。根据机器的结构约束,讨论了动态重构处理的设计决策。它提供了三种专门的语言:PPL(阶段编程语言),用于开发阶段可重构程序;GCL(图形构造语言),用于构造要执行阶段的图形;以及CPL组件编程语言),用于对软件组件进行编码。对编程环境进行测试的第一个例子是共轭梯度(CG)算法。结果令人鼓舞。开发了CG的相位可重构实现,并与固定拓扑实现(8*4环面)进行了比较。
{"title":"Programming environment for phase-reconfigurable parallel programming on SuperNode","authors":"J. Adamo, L. Trejo","doi":"10.1109/SPDP.1992.242710","DOIUrl":"https://doi.org/10.1109/SPDP.1992.242710","url":null,"abstract":"The authors present a programming environment called C-NET developed for the reconfigurable SuperNode multiprocessor. It allows the implementation of variable-topology programs that are referred to as phase-reconfigurable programs. The design decisions concerning dynamic-reconfiguration handling are discussed with regard to the architectural constraints of the machine. It provides three specialized languages: PPL (phase programming language), for the development of phase-reconfigurable programs: GCL (graph-construction language), for the construction of graphs on which the phases are to be executed; and CPL components programming language), for coding the software components. The first example on which the programming environment was tested was the conjugate-gradient (CG) algorithm. The results are encouraging. Phase-reconfigurable implementation of CG was developed and compared with a fixed topology implementation (8*4 torus).<<ETX>>","PeriodicalId":265469,"journal":{"name":"[1992] Proceedings of the Fourth IEEE Symposium on Parallel and Distributed Processing","volume":"C-20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126771849","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Distributed termination detection of loosely synchronized computations 松散同步计算的分布式终止检测
Chengzhong Xu, F. Lau
An efficient algorithm for termination detection of loosely synchronized computations is proposed. The proposed algorithm is fully symmetric in that all processes are syntactically identical and can detect global termination simultaneously. It is better in terms of the delay for termination detection than other related algorithms, and is optimal in a number of regular structures. For the hypercube structure of any dimension, the proposed algorithm takes two iteration steps to detect termination after global termination has occurred. In the chain, ring, mesh and torus structures, the improvement is about 50% over its principal competitor. The proposed algorithm requires that the graph be edge-colored and that the color-diameter be known to the processes in advance.<>
提出了一种有效的松同步计算终止检测算法。该算法是完全对称的,即所有进程在语法上是相同的,并且可以同时检测全局终止。它在终止检测的延迟方面优于其他相关算法,并且在许多规则结构中是最优的。对于任意维度的超立方体结构,该算法在全局终止后进行两次迭代检测。在链式、环形、网状和环面结构方面,比其主要竞争对手改进了约50%。所提出的算法要求图的边缘是彩色的,并且颜色直径提前被进程知道
{"title":"Distributed termination detection of loosely synchronized computations","authors":"Chengzhong Xu, F. Lau","doi":"10.1109/SPDP.1992.242744","DOIUrl":"https://doi.org/10.1109/SPDP.1992.242744","url":null,"abstract":"An efficient algorithm for termination detection of loosely synchronized computations is proposed. The proposed algorithm is fully symmetric in that all processes are syntactically identical and can detect global termination simultaneously. It is better in terms of the delay for termination detection than other related algorithms, and is optimal in a number of regular structures. For the hypercube structure of any dimension, the proposed algorithm takes two iteration steps to detect termination after global termination has occurred. In the chain, ring, mesh and torus structures, the improvement is about 50% over its principal competitor. The proposed algorithm requires that the graph be edge-colored and that the color-diameter be known to the processes in advance.<<ETX>>","PeriodicalId":265469,"journal":{"name":"[1992] Proceedings of the Fourth IEEE Symposium on Parallel and Distributed Processing","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133566983","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
On embedding ternary trees into Boolean hypercubes 在布尔超立方体中嵌入三元树
Ajay K. Gupta, Hong Wang
It is pointed out that the problem of efficiently embedding a k-ary tree into hypercube with k>or=3 has largely remained unsolved, even though optimal embeddings (i.e. embeddings achieving minimum delta , lambda , and in ) of complete and incomplete binary trees into hypercubes have been known for some time. Thus, in their quest for designing efficient embeddings of k-ary trees into hypercube for arbitrary k, the authors present some preliminary results that give efficient embeddings for the situations when k=3, 2/sup p/, 3/sup p/, 2/sup p/*3/sup q/ and p, q>0. The embedding of complete ternary trees and the embedding of complete k-ary trees are considered.<>
本文指出,尽管完全二叉树和不完全二叉树在超立方体中的最佳嵌入(即实现最小delta, lambda和in的嵌入)已经有一段时间了,但有效地将k-ary树嵌入到k>或=3的超立方体中的问题在很大程度上仍未得到解决。因此,为了寻求k树在任意k的超立方体中的有效嵌入,作者给出了k= 3,2 /sup p/, 3/sup p/, 2/sup p/*3/sup q/和p, q>0时的有效嵌入的一些初步结果。研究了完全三叉树的嵌入问题和完全k元树的嵌入问题。
{"title":"On embedding ternary trees into Boolean hypercubes","authors":"Ajay K. Gupta, Hong Wang","doi":"10.1109/SPDP.1992.242739","DOIUrl":"https://doi.org/10.1109/SPDP.1992.242739","url":null,"abstract":"It is pointed out that the problem of efficiently embedding a k-ary tree into hypercube with k>or=3 has largely remained unsolved, even though optimal embeddings (i.e. embeddings achieving minimum delta , lambda , and in ) of complete and incomplete binary trees into hypercubes have been known for some time. Thus, in their quest for designing efficient embeddings of k-ary trees into hypercube for arbitrary k, the authors present some preliminary results that give efficient embeddings for the situations when k=3, 2/sup p/, 3/sup p/, 2/sup p/*3/sup q/ and p, q>0. The embedding of complete ternary trees and the embedding of complete k-ary trees are considered.<<ETX>>","PeriodicalId":265469,"journal":{"name":"[1992] Proceedings of the Fourth IEEE Symposium on Parallel and Distributed Processing","volume":"274 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114105862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
An exact hardware implementation of the Boltzmann machine 波尔兹曼机的精确硬件实现
Marcin Skubiszewski
The author presents a faithful hardware implementation (built on the top of DECPeRLe-1, a reconfigurable coprocessor closely coupled with its host machine, a DECstation 500) of the Boltzmann machine. The prototype performs 505 megasynapses (million of additions and multiplications) per second, using 16-b fixed-point weights. It can emulate fully connected instances of the Boltzmann machine containing up to 1438 variables. This specialized hardware only executes the simplest part of the Boltzmann machine algorithm, namely, multiplying matrices of numbers by vectors of bits. The other operations (which are complicated, but only require a modest amount of computation) are performed by the host processor. It is noted that the key point of this work resides in establishing the right design choices. Among these, the most important ones are the rejection of 'neural parallelism', which makes the implementation exact, and the algorithm used to generate random numbers in software, which allows the hardware to be simple. The fact that DECPeRLe-1 makes hardware development cheap and fast was essential in this work.<>
本文给出了玻尔兹曼机的一个忠实的硬件实现(建立在DECPeRLe-1之上,DECPeRLe-1是一个可重构协处理器,与它的主机DECstation 500紧密耦合)。原型每秒执行505个megasynapses(百万次加法和乘法),使用16-b的定点权重。它可以模拟包含多达1438个变量的玻尔兹曼机的完全连接实例。这种专用硬件只执行玻尔兹曼机器算法中最简单的部分,即将数字矩阵乘以位向量。其他操作(比较复杂,但只需要少量的计算)由主机处理器执行。值得注意的是,这项工作的关键点在于建立正确的设计选择。其中,最重要的是拒绝“神经并行”,这使得实现精确,以及在软件中用于生成随机数的算法,这使得硬件变得简单。DECPeRLe-1使硬件开发变得廉价和快速,这一事实对这项工作至关重要。
{"title":"An exact hardware implementation of the Boltzmann machine","authors":"Marcin Skubiszewski","doi":"10.1109/SPDP.1992.242756","DOIUrl":"https://doi.org/10.1109/SPDP.1992.242756","url":null,"abstract":"The author presents a faithful hardware implementation (built on the top of DECPeRLe-1, a reconfigurable coprocessor closely coupled with its host machine, a DECstation 500) of the Boltzmann machine. The prototype performs 505 megasynapses (million of additions and multiplications) per second, using 16-b fixed-point weights. It can emulate fully connected instances of the Boltzmann machine containing up to 1438 variables. This specialized hardware only executes the simplest part of the Boltzmann machine algorithm, namely, multiplying matrices of numbers by vectors of bits. The other operations (which are complicated, but only require a modest amount of computation) are performed by the host processor. It is noted that the key point of this work resides in establishing the right design choices. Among these, the most important ones are the rejection of 'neural parallelism', which makes the implementation exact, and the algorithm used to generate random numbers in software, which allows the hardware to be simple. The fact that DECPeRLe-1 makes hardware development cheap and fast was essential in this work.<<ETX>>","PeriodicalId":265469,"journal":{"name":"[1992] Proceedings of the Fourth IEEE Symposium on Parallel and Distributed Processing","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116580273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
Debugging dynamic distributed programs using global predicates 使用全局谓词调试动态分布式程序
Yoshifumi Manabe, S. Aoyagi
The authors describe a debugger for distributed programs based on a replay technique. Distributed programs may dynamically fork child processes and open and close communication channels between processes. This debugger features breakpoint setting and selective trace commands with global predicate conditions called conjunctive predicate and disjunctive predicate, which are related to multiple processes. It can halt or test the processes at the first global state for a given conjunctive predicate breakpoint condition. The authors have developed a prototype distributed debugger ddbx-p on UNIX 4.2 BSD.<>
作者描述了一种基于重放技术的分布式程序调试器。分布式程序可以动态地派生子进程,并打开和关闭进程之间的通信通道。该调试器具有断点设置和选择性跟踪命令,这些命令具有称为连接谓词和析取谓词的全局谓词条件,它们与多个进程相关。对于给定的连接谓词断点条件,它可以在第一个全局状态暂停或测试进程。作者在UNIX 4.2 BSD上开发了一个分布式调试器原型ddbx-p。
{"title":"Debugging dynamic distributed programs using global predicates","authors":"Yoshifumi Manabe, S. Aoyagi","doi":"10.1109/SPDP.1992.242718","DOIUrl":"https://doi.org/10.1109/SPDP.1992.242718","url":null,"abstract":"The authors describe a debugger for distributed programs based on a replay technique. Distributed programs may dynamically fork child processes and open and close communication channels between processes. This debugger features breakpoint setting and selective trace commands with global predicate conditions called conjunctive predicate and disjunctive predicate, which are related to multiple processes. It can halt or test the processes at the first global state for a given conjunctive predicate breakpoint condition. The authors have developed a prototype distributed debugger ddbx-p on UNIX 4.2 BSD.<<ETX>>","PeriodicalId":265469,"journal":{"name":"[1992] Proceedings of the Fourth IEEE Symposium on Parallel and Distributed Processing","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121420255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
An evaluation of planar-adaptive routing (PAR) 平面自适应路由(PAR)的评价
Jae H. Kim, A. Chien
Network performance can be improved by allowing adaptive routing, but doing so introduces new possibilities of deadlock which can overwhelm the flexibility advantages. Planar-adaptive routing resolves this tension by limiting adaptive routing to a series of two-dimensional planes, reducing hardware requirements for deadlock prevention. The authors explore the performance of planar-adaptive routers for two, three, and four-dimensional networks. Under nonuniform traffic loads, the planar-adaptive router significantly outperforms the dimension-order router, while giving comparable performance under uniform loads. With equal resources, the planar-adaptive router provides performance superior to fully adaptive routers because it requires less resources for deadlock prevention, freeing resources to increase the number of virtual lanes.<>
通过允许自适应路由可以提高网络性能,但是这样做会引入新的死锁可能性,这可能会压倒灵活性的优势。平面自适应路由通过将自适应路由限制到一系列二维平面来解决这种紧张关系,减少了死锁预防的硬件要求。作者探讨了平面自适应路由器在二维、三维和四维网络中的性能。在非均匀负载下,平面自适应路由器的性能明显优于维序路由器,同时在均匀负载下也具有相当的性能。在资源相同的情况下,平面自适应路由器的性能优于完全自适应路由器,因为它需要更少的资源来防止死锁,从而释放资源来增加虚拟通道的数量
{"title":"An evaluation of planar-adaptive routing (PAR)","authors":"Jae H. Kim, A. Chien","doi":"10.1109/SPDP.1992.242708","DOIUrl":"https://doi.org/10.1109/SPDP.1992.242708","url":null,"abstract":"Network performance can be improved by allowing adaptive routing, but doing so introduces new possibilities of deadlock which can overwhelm the flexibility advantages. Planar-adaptive routing resolves this tension by limiting adaptive routing to a series of two-dimensional planes, reducing hardware requirements for deadlock prevention. The authors explore the performance of planar-adaptive routers for two, three, and four-dimensional networks. Under nonuniform traffic loads, the planar-adaptive router significantly outperforms the dimension-order router, while giving comparable performance under uniform loads. With equal resources, the planar-adaptive router provides performance superior to fully adaptive routers because it requires less resources for deadlock prevention, freeing resources to increase the number of virtual lanes.<<ETX>>","PeriodicalId":265469,"journal":{"name":"[1992] Proceedings of the Fourth IEEE Symposium on Parallel and Distributed Processing","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122620073","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 34
Shared memory vs. message passing in shared-memory multiprocessors 共享内存与共享内存多处理器中的消息传递
T. LeBlanc, E. Markatos
It is argued that the choice between the shared-memory and message-passing models depends on two factors: the relative cost of communication and computation as implemented by the hardware, and the degree of load imbalance inherent in the application. Two representative applications are used to illustrate the performance advantages of each programming model on several different shared-memory machines, including the BBN Butterfly, Sequent Symmetry, Encore Multimax and Silicon Graphics Iris multiprocessors. It is shown that applications implemented in the shared-memory model perform better on the previous generation of multiprocessors, while applications implemented in the message-passing model perform better on modern multiprocessors. It is argued that both models have performance advantages, and that the factors that influence the choice of model may not be known at compile-time. As a compromise solution, the authors propose an alternative programming model, which has the load balancing properties of the shared-memory model and the locality properties of the message-passing model, and show that this new model performs better than the other two alternatives.<>
有人认为,在共享内存和消息传递模型之间的选择取决于两个因素:硬件实现的通信和计算的相对成本,以及应用程序中固有的负载不平衡程度。本文使用了两个代表性的应用程序来说明每种编程模型在几种不同的共享内存机器上的性能优势,包括BBN Butterfly、sequential Symmetry、Encore Multimax和Silicon Graphics Iris多处理器。结果表明,在共享内存模型中实现的应用程序在上一代多处理器上的性能更好,而在消息传递模型中实现的应用程序在现代多处理器上的性能更好。有人认为,这两种模型都具有性能优势,并且影响模型选择的因素在编译时可能不知道。作为一种折衷的解决方案,作者提出了一种替代的编程模型,该模型具有共享内存模型的负载平衡特性和消息传递模型的局域性特性,并表明该新模型的性能优于其他两种替代方案。
{"title":"Shared memory vs. message passing in shared-memory multiprocessors","authors":"T. LeBlanc, E. Markatos","doi":"10.1109/SPDP.1992.242736","DOIUrl":"https://doi.org/10.1109/SPDP.1992.242736","url":null,"abstract":"It is argued that the choice between the shared-memory and message-passing models depends on two factors: the relative cost of communication and computation as implemented by the hardware, and the degree of load imbalance inherent in the application. Two representative applications are used to illustrate the performance advantages of each programming model on several different shared-memory machines, including the BBN Butterfly, Sequent Symmetry, Encore Multimax and Silicon Graphics Iris multiprocessors. It is shown that applications implemented in the shared-memory model perform better on the previous generation of multiprocessors, while applications implemented in the message-passing model perform better on modern multiprocessors. It is argued that both models have performance advantages, and that the factors that influence the choice of model may not be known at compile-time. As a compromise solution, the authors propose an alternative programming model, which has the load balancing properties of the shared-memory model and the locality properties of the message-passing model, and show that this new model performs better than the other two alternatives.<<ETX>>","PeriodicalId":265469,"journal":{"name":"[1992] Proceedings of the Fourth IEEE Symposium on Parallel and Distributed Processing","volume":"86 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126169770","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 43
Page replacement in distributed virtual memory systems 分布式虚拟内存系统中的页面替换
M. Malkawi, D. Knox, M. Abaza
The authors introduce three page replacement, and page out policies, in distributed virtual memory systems. Two of the replacement policies, the least recently brought and the global recently used or brought, are adapted versions of the least recently used policy, which is well known in conventional virtual memory systems. Trace driven simulation was used to evaluate the performance of the replacement policies and the RR (round robin), LAN (least active neighbor), and LLN (least loaded neighbor) page out policies. The results suggest that when the cost of internode faults is considerably higher than local memory access, global and remote policies are superior to the local one. When the cost of bringing a page from the immediate neighbor is considerably low compared to the cost of accessing the local memory, the local policy performs as well as the global and the remote. Among the page out policies, round robin is the least efficient. LLN generates lower cost than LAN when the size of the local memory is relatively large. Under high memory contention, LAN shows better performance.<>
作者介绍了分布式虚拟内存系统中的三页替换和出页策略。替换策略中的两个,即最近最少使用的策略和全局最近使用的策略,是最近最少使用的策略的改编版本,这在传统的虚拟内存系统中是众所周知的。跟踪驱动仿真用于评估替换策略和RR(轮询)、LAN(最少活跃邻居)和LLN(最少加载邻居)出页策略的性能。结果表明,当节点间故障的成本明显高于本地内存访问成本时,全局和远程策略优于本地策略。与访问本地内存的成本相比,从直接邻居获取页面的成本要低得多,那么本地策略的执行效果与全局策略和远程策略一样好。在出页策略中,轮循是效率最低的。当本地内存相对较大时,LLN比LAN产生更低的开销。在内存竞争激烈的情况下,局域网表现出更好的性能
{"title":"Page replacement in distributed virtual memory systems","authors":"M. Malkawi, D. Knox, M. Abaza","doi":"10.1109/SPDP.1992.242719","DOIUrl":"https://doi.org/10.1109/SPDP.1992.242719","url":null,"abstract":"The authors introduce three page replacement, and page out policies, in distributed virtual memory systems. Two of the replacement policies, the least recently brought and the global recently used or brought, are adapted versions of the least recently used policy, which is well known in conventional virtual memory systems. Trace driven simulation was used to evaluate the performance of the replacement policies and the RR (round robin), LAN (least active neighbor), and LLN (least loaded neighbor) page out policies. The results suggest that when the cost of internode faults is considerably higher than local memory access, global and remote policies are superior to the local one. When the cost of bringing a page from the immediate neighbor is considerably low compared to the cost of accessing the local memory, the local policy performs as well as the global and the remote. Among the page out policies, round robin is the least efficient. LLN generates lower cost than LAN when the size of the local memory is relatively large. Under high memory contention, LAN shows better performance.<<ETX>>","PeriodicalId":265469,"journal":{"name":"[1992] Proceedings of the Fourth IEEE Symposium on Parallel and Distributed Processing","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117182832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
期刊
[1992] Proceedings of the Fourth IEEE Symposium on Parallel and Distributed Processing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1