Pub Date : 1991-04-28DOI: 10.1109/DMCC.1991.633345
R. Albrizio, G. Aloisio, A. Mazzone, P. Messina, N. Veneziani
{"title":"Performance Of Mulitprocessor Structures For Fast Digital Sar Processing","authors":"R. Albrizio, G. Aloisio, A. Mazzone, P. Messina, N. Veneziani","doi":"10.1109/DMCC.1991.633345","DOIUrl":"https://doi.org/10.1109/DMCC.1991.633345","url":null,"abstract":"","PeriodicalId":313314,"journal":{"name":"The Sixth Distributed Memory Computing Conference, 1991. Proceedings","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132592026","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1991-04-28DOI: 10.1109/DMCC.1991.633096
Peter Christy
The design and implementation of commercial, massively parallel computers is at an interesting level of maturity where many basic design decisions are still influx. This paper considers alternative means of programming machine-size independent computations on a parallel array computer. Two specific mechanisms are contrasted:
{"title":"Virtual Processors Considered Harmful","authors":"Peter Christy","doi":"10.1109/DMCC.1991.633096","DOIUrl":"https://doi.org/10.1109/DMCC.1991.633096","url":null,"abstract":"The design and implementation of commercial, massively parallel computers is at an interesting level of maturity where many basic design decisions are still influx. This paper considers alternative means of programming machine-size independent computations on a parallel array computer. Two specific mechanisms are contrasted:","PeriodicalId":313314,"journal":{"name":"The Sixth Distributed Memory Computing Conference, 1991. Proceedings","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132767470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1991-04-28DOI: 10.1109/DMCC.1991.633124
N. Mansour, Geoffrey Fox
We present a new approach to balancing the work load in a multicomputer when the problem is de composed into subproblems mapped to the processors. It is based on a hybrid genetic algo rithm. A number of design choices for genetic algo rithms are combined in order to ameliorate the problem of premature convergence that is often en countered in the implementation of classical genet ic algorithms. The algorithm is hybridized by including a hill climbing procedure which signifi cantly improves the efficiency of the evolution. Moreover, it makes use of problem specific infor mation to evade some computational costs and to reinforce favorable aspects of the genetic search at some appropriate points. The experimental results show that the hybrid genetic algorithm can find so lutions within 3% of the optimum in a reasonable time. They also suggest that this approach is not bi ased towards particular problem structures.
{"title":"An Evolutionary Approach to Load Balancing Parallel Computations","authors":"N. Mansour, Geoffrey Fox","doi":"10.1109/DMCC.1991.633124","DOIUrl":"https://doi.org/10.1109/DMCC.1991.633124","url":null,"abstract":"We present a new approach to balancing the work load in a multicomputer when the problem is de composed into subproblems mapped to the processors. It is based on a hybrid genetic algo rithm. A number of design choices for genetic algo rithms are combined in order to ameliorate the problem of premature convergence that is often en countered in the implementation of classical genet ic algorithms. The algorithm is hybridized by including a hill climbing procedure which signifi cantly improves the efficiency of the evolution. Moreover, it makes use of problem specific infor mation to evade some computational costs and to reinforce favorable aspects of the genetic search at some appropriate points. The experimental results show that the hybrid genetic algorithm can find so lutions within 3% of the optimum in a reasonable time. They also suggest that this approach is not bi ased towards particular problem structures.","PeriodicalId":313314,"journal":{"name":"The Sixth Distributed Memory Computing Conference, 1991. Proceedings","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130107901","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1991-04-28DOI: 10.1109/DMCC.1991.633303
D.R. Mallampati, P. Mutalik, R. L. Wainwright
This paper describes and unulyses a new parallel algorithm using simulated annealing forfinding a good solution to the Traveling Salesman Problem. This algorithm combines the strong points of three recent implementations [ I ,251 with some new features. An initial tour is generated and partitioned among a ring of processors. Each processor receives two disconnected parts (tiers) of the tour. The algorithm is subdivided into three phases. In phase one, 2-opting is performed separately within each of the two tiers of the tour. During the secondphase remoteswapping is performed between cities from the two diflerent tiers of the tour. During phase three, synchronization of the cities is accomplished by each processor shifting a quarter of its cities in a clock-wise direction to its neighboring node. This is called a quarter-spin. Results show this algorithm is superior over recent implementations. For the datasets tested, this algorithm yielded improvements ranging from 32% to 56% compared to three recent implementations. The signiBcance of this algorithm is the manner in which cities from different parts of the tour are combined to form new tours. The multiple phases within the algorithm allows for a better mixture of cities compared to previous algorithms.
{"title":"A Parallel Multi-Phase Implementation of Simulated Annealing for the Traveling Salesman Problem","authors":"D.R. Mallampati, P. Mutalik, R. L. Wainwright","doi":"10.1109/DMCC.1991.633303","DOIUrl":"https://doi.org/10.1109/DMCC.1991.633303","url":null,"abstract":"This paper describes and unulyses a new parallel algorithm using simulated annealing forfinding a good solution to the Traveling Salesman Problem. This algorithm combines the strong points of three recent implementations [ I ,251 with some new features. An initial tour is generated and partitioned among a ring of processors. Each processor receives two disconnected parts (tiers) of the tour. The algorithm is subdivided into three phases. In phase one, 2-opting is performed separately within each of the two tiers of the tour. During the secondphase remoteswapping is performed between cities from the two diflerent tiers of the tour. During phase three, synchronization of the cities is accomplished by each processor shifting a quarter of its cities in a clock-wise direction to its neighboring node. This is called a quarter-spin. Results show this algorithm is superior over recent implementations. For the datasets tested, this algorithm yielded improvements ranging from 32% to 56% compared to three recent implementations. The signiBcance of this algorithm is the manner in which cities from different parts of the tour are combined to form new tours. The multiple phases within the algorithm allows for a better mixture of cities compared to previous algorithms.","PeriodicalId":313314,"journal":{"name":"The Sixth Distributed Memory Computing Conference, 1991. Proceedings","volume":"55 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131965914","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1991-04-28DOI: 10.1109/DMCC.1991.633176
Y. Tamir, Yoshio Turner
A message transport mechanism which provides highbandwidth low-latency interprocessor communication is the key to the ability of multicomputers to achieve high performance. The system should adapt to changing conditions by routing packets around congested areas and failed links or nodes. We introduce a new message transport mechanism, called Dynamic Virtual Circuits, that combines the best features of circuit switching, packet switching, and static virtual circuits. Routing through intermediate nodes usually requires only a single lookup in a small table, packets include minimal control information, and are delivered in FIFO order. Nodes in the middle of a Dynamic Virtual Circuit can break it and later reestablish it through a different physical path, thus supporting adaptive routing while maintaining the semantics of virtual circuits. We present the basic algorithms for Dynamic Virtual Circuits and the required hardware support in the context of a VLSI communication coprocessor for multicomputers.
{"title":"High-Performance Adaptive Routing in Multicomputers Using Dynamic Virtual Circuits","authors":"Y. Tamir, Yoshio Turner","doi":"10.1109/DMCC.1991.633176","DOIUrl":"https://doi.org/10.1109/DMCC.1991.633176","url":null,"abstract":"A message transport mechanism which provides highbandwidth low-latency interprocessor communication is the key to the ability of multicomputers to achieve high performance. The system should adapt to changing conditions by routing packets around congested areas and failed links or nodes. We introduce a new message transport mechanism, called Dynamic Virtual Circuits, that combines the best features of circuit switching, packet switching, and static virtual circuits. Routing through intermediate nodes usually requires only a single lookup in a small table, packets include minimal control information, and are delivered in FIFO order. Nodes in the middle of a Dynamic Virtual Circuit can break it and later reestablish it through a different physical path, thus supporting adaptive routing while maintaining the semantics of virtual circuits. We present the basic algorithms for Dynamic Virtual Circuits and the required hardware support in the context of a VLSI communication coprocessor for multicomputers.","PeriodicalId":313314,"journal":{"name":"The Sixth Distributed Memory Computing Conference, 1991. Proceedings","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126627288","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1991-04-28DOI: 10.1109/DMCC.1991.633115
T. Stricker
Wormhole message routing is supported by the communication hardware of several distributed memory machines. This particular method of message routing has numerous advantages but creates the problem of a routing deadlock. When long messages compete for the same channels in the network, some messages will be blocked until the the first message is fully consumed by the processor at the destination of the message. A deadlock occurs if a set of messages mutually blocks, and no message can progress towards its destination. Most deadlock free routing schemes previously known are designed to work on regular binary hypercubes. Regular hypercubes and meshes are just a special case of networks. However, these routing schemes do not provide enough flexibility to deal with irregular 2-D-tori and with attached auxiliary cells, which can be found on many newer parallel systems. To handle irregular topologies elegantly, a simple proof is necessary to verify the router code. The new proof given in this report is carried out directly on the network graph. It is constructive in the sense that it reveals the design options to deal with irregularities and shows how additional flexibility can be used to achieve better load balancing. Based on the modified routing model, a set of deadlock free router functions relevant to the iWarp system configurations are described and proven to be correct.
{"title":"Message Routing On Irregular 2d-meshes And Tori","authors":"T. Stricker","doi":"10.1109/DMCC.1991.633115","DOIUrl":"https://doi.org/10.1109/DMCC.1991.633115","url":null,"abstract":"Wormhole message routing is supported by the communication hardware of several distributed memory machines. This particular method of message routing has numerous advantages but creates the problem of a routing deadlock. When long messages compete for the same channels in the network, some messages will be blocked until the the first message is fully consumed by the processor at the destination of the message. A deadlock occurs if a set of messages mutually blocks, and no message can progress towards its destination. Most deadlock free routing schemes previously known are designed to work on regular binary hypercubes. Regular hypercubes and meshes are just a special case of networks. However, these routing schemes do not provide enough flexibility to deal with irregular 2-D-tori and with attached auxiliary cells, which can be found on many newer parallel systems. To handle irregular topologies elegantly, a simple proof is necessary to verify the router code. The new proof given in this report is carried out directly on the network graph. It is constructive in the sense that it reveals the design options to deal with irregularities and shows how additional flexibility can be used to achieve better load balancing. Based on the modified routing model, a set of deadlock free router functions relevant to the iWarp system configurations are described and proven to be correct.","PeriodicalId":313314,"journal":{"name":"The Sixth Distributed Memory Computing Conference, 1991. Proceedings","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125921645","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1991-04-28DOI: 10.1109/DMCC.1991.633318
G. DeTitta, H. Hauptman, R. Miller, M. Pagels, T. Sabin, P. Thuman, C. Weeks
{"title":"Parallel Solutions to the Phase Problem in X-Ray Crystallography: An Update","authors":"G. DeTitta, H. Hauptman, R. Miller, M. Pagels, T. Sabin, P. Thuman, C. Weeks","doi":"10.1109/DMCC.1991.633318","DOIUrl":"https://doi.org/10.1109/DMCC.1991.633318","url":null,"abstract":"","PeriodicalId":313314,"journal":{"name":"The Sixth Distributed Memory Computing Conference, 1991. Proceedings","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114249873","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1991-04-28DOI: 10.1109/DMCC.1991.633353
S. Lillevik
In Sep tember , 1990, the Intel C o r p o r a t i o n demonstrated the third of four major Touchstone Program prototype systems. Denoted DELTA, the prototype scales to over 500 nodes, provides aggregate peak performance in excess of 30 GFLOP’s, and contains a new ,interconnect network based on a Caltech-designed router device. DELTA contains four heterogeneous node types f o r numeric, service, inputloutput, and network fmctions. The operating system supports message-passing paradigms and intefaces with a Concurrent File System. Users access DELTA across a local area network and may select either the C or FORTRAN programmin,g languages. An interactive parallel debugger assists in application development and performance tuning.
1990年9月登贝就是,英特尔C o r p o r t i o n证明了四个主要的第三个试金石程序原型系统。该原型被称为DELTA,可扩展到500多个节点,提供超过30 GFLOP的总峰值性能,并包含一个基于加州理工学院设计的路由器设备的新型互连网络。DELTA包含四种异构节点类型:数字、服务、输入输出和网络功能。操作系统支持消息传递范例和并发文件系统接口。用户通过局域网访问DELTA,可以选择C或FORTRAN编程语言。交互式并行调试器有助于应用程序开发和性能调优。
{"title":"The Touchstone 30 Gigaflop DELTA Prototype","authors":"S. Lillevik","doi":"10.1109/DMCC.1991.633353","DOIUrl":"https://doi.org/10.1109/DMCC.1991.633353","url":null,"abstract":"In Sep tember , 1990, the Intel C o r p o r a t i o n demonstrated the third of four major Touchstone Program prototype systems. Denoted DELTA, the prototype scales to over 500 nodes, provides aggregate peak performance in excess of 30 GFLOP’s, and contains a new ,interconnect network based on a Caltech-designed router device. DELTA contains four heterogeneous node types f o r numeric, service, inputloutput, and network fmctions. The operating system supports message-passing paradigms and intefaces with a Concurrent File System. Users access DELTA across a local area network and may select either the C or FORTRAN programmin,g languages. An interactive parallel debugger assists in application development and performance tuning.","PeriodicalId":313314,"journal":{"name":"The Sixth Distributed Memory Computing Conference, 1991. Proceedings","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131384607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1991-04-28DOI: 10.1109/DMCC.1991.633072
Halsur V. Sreekantaswamy, N. Goldstein, A. Wagner, S. Chanson
This paper describes two of the intijor components of TIPS, a Transputer-based Iiirteractive Parallelizing System, under development at UBC. The system runs on a 74 node transpuier system iiilercorinectcd by crossbar switches with inulliple liiihs to ike host, a SUN-4. It uses Trollaus with the Logical Syslems C compiler. The first component described is TMRP, a topology independent mapping facility. TMAP’s objective is to automate the mapping process, and muhe it independent from changes an the underlying architecture. It integrates two large pieces of soft,ware, Trollivs and Prep-p. We describe its design and discuss specific problems in trying to achieve a machine iudependent environment. The second com.poiient described is TRES, a higher level reso’urce managernelit facility. TRES is based on parameterized models of coniputation which are used 20 predict perforni.nnce and optimize the use of machine resources. The user need only specify the model (i.e. prograinmin,g paradigm) and the computational task to be performed. TRES determines the optimal topology and number of processors to use. This inforniation is used by the TMAP system.
本文介绍了UBC正在开发的基于转译器的交互式并行化系统TIPS的两个主要组件。该系统运行在一个74节点的透光系统上,该透光系统由具有多个接口的交叉开关连接到像SUN-4这样的主机。它使用Trollaus和logic systems C编译器。描述的第一个组件是TMRP,它是一种与拓扑无关的映射工具。TMAP的目标是使映射过程自动化,并使其独立于底层体系结构的更改。它集成了两大软件:Trollivs和Prep-p。我们描述了它的设计,并讨论了试图实现机器独立环境的具体问题。第二个com。所描述的特性是TRES,一种更高级别的资源管理工具。TRES基于争论的参数化模型,该模型用于预测性能。优化机器资源的使用。用户只需要指定模型(即编程,范式)和要执行的计算任务。TRES确定要使用的最优拓扑和处理器数量。TMAP系统使用这些信息。
{"title":"Resource Management in a Large Reconfigurable Transputer-based System","authors":"Halsur V. Sreekantaswamy, N. Goldstein, A. Wagner, S. Chanson","doi":"10.1109/DMCC.1991.633072","DOIUrl":"https://doi.org/10.1109/DMCC.1991.633072","url":null,"abstract":"This paper describes two of the intijor components of TIPS, a Transputer-based Iiirteractive Parallelizing System, under development at UBC. The system runs on a 74 node transpuier system iiilercorinectcd by crossbar switches with inulliple liiihs to ike host, a SUN-4. It uses Trollaus with the Logical Syslems C compiler. The first component described is TMRP, a topology independent mapping facility. TMAP’s objective is to automate the mapping process, and muhe it independent from changes an the underlying architecture. It integrates two large pieces of soft,ware, Trollivs and Prep-p. We describe its design and discuss specific problems in trying to achieve a machine iudependent environment. The second com.poiient described is TRES, a higher level reso’urce managernelit facility. TRES is based on parameterized models of coniputation which are used 20 predict perforni.nnce and optimize the use of machine resources. The user need only specify the model (i.e. prograinmin,g paradigm) and the computational task to be performed. TRES determines the optimal topology and number of processors to use. This inforniation is used by the TMAP system.","PeriodicalId":313314,"journal":{"name":"The Sixth Distributed Memory Computing Conference, 1991. Proceedings","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127931834","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1991-04-28DOI: 10.1109/DMCC.1991.633347
Arun Kumar Somani, Sangbang Choi
The interconnection network of a multiprocessor system should be able to embed an arbitrary permutation of nodes to map an arbitrary structure of a program graph and realize required communication paths. We show that distributed routing algorithms have high blocking probability to route permutations in binarycube-based systems. We further show that there exists no recursive algorithm to embed a permutation in binary n-cube for n 2 5 . We t.2en develop rearrangeable hypercube architectures and ro,uting algorithms to realize arbitra y permutations in circzlit switching. We show that if each connection between two neighboring nodes consists of 2 pairs of links, i.e., (2 full-duplex communication lines), the hypercube can embed 2 arbitrary permutations of nodes simultaneously. We also prove that a hypercube is rearrangeable i f one additional pair of links is provided in any one dimension of connections.
{"title":"On Embedding Permutations in Hypercubes","authors":"Arun Kumar Somani, Sangbang Choi","doi":"10.1109/DMCC.1991.633347","DOIUrl":"https://doi.org/10.1109/DMCC.1991.633347","url":null,"abstract":"The interconnection network of a multiprocessor system should be able to embed an arbitrary permutation of nodes to map an arbitrary structure of a program graph and realize required communication paths. We show that distributed routing algorithms have high blocking probability to route permutations in binarycube-based systems. We further show that there exists no recursive algorithm to embed a permutation in binary n-cube for n 2 5 . We t.2en develop rearrangeable hypercube architectures and ro,uting algorithms to realize arbitra y permutations in circzlit switching. We show that if each connection between two neighboring nodes consists of 2 pairs of links, i.e., (2 full-duplex communication lines), the hypercube can embed 2 arbitrary permutations of nodes simultaneously. We also prove that a hypercube is rearrangeable i f one additional pair of links is provided in any one dimension of connections.","PeriodicalId":313314,"journal":{"name":"The Sixth Distributed Memory Computing Conference, 1991. Proceedings","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133756448","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}