Pub Date : 1991-04-28DOI: 10.1109/DMCC.1991.633311
M. H. Davis, U. Ramachandran
This research examines the use of an optical bus in a distributed memory multicomputer. We formulate a simple multicomputer model in order to concentrate on the performance of the optical bus from a Computer Architecture viewpoint. In this report we consider the optical bus to be a Small Local Area Network (S-LAN) and apply two classical LAN medium access protocols, Time Division Multiple Access (TDMA) and Carrier Sense Multiple Access/Collision Detection (CSMA/CD). From our standard discrete-event simulation experiments we principally conclude that CSMA/CD does not outperform TDMA as much as might be expected from classical analyses. We also conclude that TDMA allows a large number of nodes to be connected to the optical bus for our model.
{"title":"Using an Optical Bus in a Distributed Memory Multicomputer","authors":"M. H. Davis, U. Ramachandran","doi":"10.1109/DMCC.1991.633311","DOIUrl":"https://doi.org/10.1109/DMCC.1991.633311","url":null,"abstract":"This research examines the use of an optical bus in a distributed memory multicomputer. We formulate a simple multicomputer model in order to concentrate on the performance of the optical bus from a Computer Architecture viewpoint. In this report we consider the optical bus to be a Small Local Area Network (S-LAN) and apply two classical LAN medium access protocols, Time Division Multiple Access (TDMA) and Carrier Sense Multiple Access/Collision Detection (CSMA/CD). From our standard discrete-event simulation experiments we principally conclude that CSMA/CD does not outperform TDMA as much as might be expected from classical analyses. We also conclude that TDMA allows a large number of nodes to be connected to the optical bus for our model.","PeriodicalId":313314,"journal":{"name":"The Sixth Distributed Memory Computing Conference, 1991. Proceedings","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123659628","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1991-04-28DOI: 10.1109/DMCC.1991.633306
A. Secchi, M. Morari, E. Biscaia
We investigate the concurrent solution of low-index differential-algebraic equations (DAE’s) by the waveform relaxation (WR) method, an iterative method for system integration. We present our new simulation code, DAWRS (Differential - Algebraic - Waveform Relaxation Solver), to solve DAE’s on parallel machines using the WR methods, and describe new techniques to improve the convergence of such methods. As experimental results, we demonstrate the achievable concurrent performance to solve DAE’s for a class of applications in chemical engineering.
{"title":"DAWRS: A Differential - Algebraic System Solver by the Waveform Relaxation Method","authors":"A. Secchi, M. Morari, E. Biscaia","doi":"10.1109/DMCC.1991.633306","DOIUrl":"https://doi.org/10.1109/DMCC.1991.633306","url":null,"abstract":"We investigate the concurrent solution of low-index \u0000differential-algebraic equations (DAE’s) by the waveform \u0000relaxation (WR) method, an iterative method for system \u0000integration. We present our new simulation code, DAWRS \u0000(Differential - Algebraic - Waveform Relaxation Solver), to \u0000solve DAE’s on parallel machines using the WR methods, and \u0000describe new techniques to improve the convergence of \u0000such methods. As experimental results, we demonstrate the achievable concurrent performance to solve DAE’s for \u0000a class of applications in chemical engineering.","PeriodicalId":313314,"journal":{"name":"The Sixth Distributed Memory Computing Conference, 1991. Proceedings","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125546257","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1991-04-28DOI: 10.1109/DMCC.1991.633210
S. Crivelli, E. Jessup
In this paper, we compare the costs of computing a single eigenvalue of a. symmetric tridiagonal matrix b y serial bisection and b y parallel multisection on a hypercube multiprocessor. We show how the optimal method for computing one eigenvalue depends on such variables as the matrix order and parameters of the hypercube used. Our analysis is supported b y experiments on an Intel iPSC-2 hypercube multiprocessor.
{"title":"Toward an Efficient Pamallel Implementation of the Bisection Method for Computing Eigenvalues","authors":"S. Crivelli, E. Jessup","doi":"10.1109/DMCC.1991.633210","DOIUrl":"https://doi.org/10.1109/DMCC.1991.633210","url":null,"abstract":"In this paper, we compare the costs of computing a single eigenvalue of a. symmetric tridiagonal matrix b y serial bisection and b y parallel multisection on a hypercube multiprocessor. We show how the optimal method for computing one eigenvalue depends on such variables as the matrix order and parameters of the hypercube used. Our analysis is supported b y experiments on an Intel iPSC-2 hypercube multiprocessor.","PeriodicalId":313314,"journal":{"name":"The Sixth Distributed Memory Computing Conference, 1991. Proceedings","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131841790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1991-04-28DOI: 10.1109/DMCC.1991.633108
Kouichi Kimura, Nobuyuki Ichiyoshi
This paper investigates the optimal efficiency of the multi-level dynamic load balancing scheme for ORparallel programs, using probability theory. In the single-level dynamic load balancing scheme, one processor divides a given task into a number of subtasks, which are distributed to other processors on demand and then executed independently. We introduce a formal model of the execution as a queuing system with several servers. And we investigate the optimal granularity of the subtasks to attain the maximal efficiency, taking account of dividing costs and load imbalance between the processors. Thus we obtain estimates of the maximal efficiency. We then apply these results to analysis of the efficiency of the multi-level dynamic load balancing scheme, which is the iterated application of the singlelevel scheme in a hierarchical manner. And we show how the scalability is thereby improved over the singlelevel scheme.
{"title":"Probabilistic Analysis of the Optimal Efficiency of the Multi-Level Dynamic Load Balancing Scheme","authors":"Kouichi Kimura, Nobuyuki Ichiyoshi","doi":"10.1109/DMCC.1991.633108","DOIUrl":"https://doi.org/10.1109/DMCC.1991.633108","url":null,"abstract":"This paper investigates the optimal efficiency of the multi-level dynamic load balancing scheme for ORparallel programs, using probability theory. In the single-level dynamic load balancing scheme, one processor divides a given task into a number of subtasks, which are distributed to other processors on demand and then executed independently. We introduce a formal model of the execution as a queuing system with several servers. And we investigate the optimal granularity of the subtasks to attain the maximal efficiency, taking account of dividing costs and load imbalance between the processors. Thus we obtain estimates of the maximal efficiency. We then apply these results to analysis of the efficiency of the multi-level dynamic load balancing scheme, which is the iterated application of the singlelevel scheme in a hierarchical manner. And we show how the scalability is thereby improved over the singlelevel scheme.","PeriodicalId":313314,"journal":{"name":"The Sixth Distributed Memory Computing Conference, 1991. Proceedings","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116620894","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1991-04-28DOI: 10.1109/DMCC.1991.633073
A. Skjellum, C. Still
{"title":"Zipcocle and the Reactive Kernel for the Caltech Intel Delta Prototype and nCUBE/2*","authors":"A. Skjellum, C. Still","doi":"10.1109/DMCC.1991.633073","DOIUrl":"https://doi.org/10.1109/DMCC.1991.633073","url":null,"abstract":"","PeriodicalId":313314,"journal":{"name":"The Sixth Distributed Memory Computing Conference, 1991. Proceedings","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115073976","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1991-04-28DOI: 10.1109/DMCC.1991.633141
L. Hamren, S. Mattisson
This paper describes an efficient implemientation of the Caltech Cosmic Environment/Reactive Kernel multicomputer communication primitives on a Sequent Symmetry, a shared-memory multiprocessor. With this implementation, the Reactive Kernel primitives exist on distributed-memory as well as sharedmemory computers, and a program can be ported between machines like the Symult s2010 multicomputer and the Sequent Symmetry by just recompiling the code. The message startup time on the Sequent is comparable to that of the Symult.
{"title":"The Reactive Kernel on a Shared-Memory Computer","authors":"L. Hamren, S. Mattisson","doi":"10.1109/DMCC.1991.633141","DOIUrl":"https://doi.org/10.1109/DMCC.1991.633141","url":null,"abstract":"This paper describes an efficient implemientation of the Caltech Cosmic Environment/Reactive Kernel multicomputer communication primitives on a Sequent Symmetry, a shared-memory multiprocessor. With this implementation, the Reactive Kernel primitives exist on distributed-memory as well as sharedmemory computers, and a program can be ported between machines like the Symult s2010 multicomputer and the Sequent Symmetry by just recompiling the code. The message startup time on the Sequent is comparable to that of the Symult.","PeriodicalId":313314,"journal":{"name":"The Sixth Distributed Memory Computing Conference, 1991. Proceedings","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133870161","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1991-04-28DOI: 10.1109/DMCC.1991.633146
E. Anderson, A. Benzoni, J. Dongarra, S. Moulton, S. Ostrouchov, B. Tourancheau, R. van de Geijn
{"title":"Basic Linear Algebra Comrnunication Subprograms","authors":"E. Anderson, A. Benzoni, J. Dongarra, S. Moulton, S. Ostrouchov, B. Tourancheau, R. van de Geijn","doi":"10.1109/DMCC.1991.633146","DOIUrl":"https://doi.org/10.1109/DMCC.1991.633146","url":null,"abstract":"","PeriodicalId":313314,"journal":{"name":"The Sixth Distributed Memory Computing Conference, 1991. Proceedings","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132974541","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1991-04-28DOI: 10.1109/DMCC.1991.633130
Zhiwei Xu
Currently, parallel programs for distributed memory multicomputers are difficult to write, understand, test, and reason about. It is observed that these difficulties can be attributed to the lack of a structured style in current parallel programming practice. In this paper, we present a structured methodology to facilitate parallel program development on distributed memory multicomputers. The methodology aims to developing parallel programs that are determinate (the same input always produces the same output, in other words, the result is repeatable), terminating (the program is free of deadlock and other infinite waiting anomalies), and easy to understand and test. It also enables us to take advantage of the conventional, well established techniques of sofhvare engineering. ming to parallel program development. However, some new ideas are added to handle parallelism. The methodology contains three basic principles: (1) Use structured constructs; (2) develop determinate and terminating programs; (3) follow a two-phase design; (4) use a mathematical model to define semantics of parallel programs; and (5) employ computer aided techniques for analyzing and checking programs. Our basic approach is to combine these principles to cope with the complexity of parallel programming. As shown in Fig.1, while the total space of all parallel programs is very large, applying the first three principles drastically reduces the space to a subspace (Class IV). Since this subspace is much smaller, the programming task becomes simpler.
{"title":"Structured Parallel Programming on Multicomputers","authors":"Zhiwei Xu","doi":"10.1109/DMCC.1991.633130","DOIUrl":"https://doi.org/10.1109/DMCC.1991.633130","url":null,"abstract":"Currently, parallel programs for distributed memory multicomputers are difficult to write, understand, test, and reason about. It is observed that these difficulties can be attributed to the lack of a structured style in current parallel programming practice. In this paper, we present a structured methodology to facilitate parallel program development on distributed memory multicomputers. The methodology aims to developing parallel programs that are determinate (the same input always produces the same output, in other words, the result is repeatable), terminating (the program is free of deadlock and other infinite waiting anomalies), and easy to understand and test. It also enables us to take advantage of the conventional, well established techniques of sofhvare engineering. ming to parallel program development. However, some new ideas are added to handle parallelism. The methodology contains three basic principles: (1) Use structured constructs; (2) develop determinate and terminating programs; (3) follow a two-phase design; (4) use a mathematical model to define semantics of parallel programs; and (5) employ computer aided techniques for analyzing and checking programs. Our basic approach is to combine these principles to cope with the complexity of parallel programming. As shown in Fig.1, while the total space of all parallel programs is very large, applying the first three principles drastically reduces the space to a subspace (Class IV). Since this subspace is much smaller, the programming task becomes simpler.","PeriodicalId":313314,"journal":{"name":"The Sixth Distributed Memory Computing Conference, 1991. Proceedings","volume":"26 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114126831","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1991-04-28DOI: 10.1109/DMCC.1991.633351
H. Corporaal, J. Olk
It is a well known fact that full custom designed computer architectures can achieve much higher performance for specific applications than general purpose computers. Thisperjormance has to be paidfor: a long design trajectory results in a high cost-performance ratio. Current VLSI design and compilation tools however, make semi-custom designs feasible with greatly reduced costs and time to market. This paper presents a scalable andflexible communication processor for message passing MIMD systems. This communication processor is implemented as a parametrisized VLSI routing cell in a VLSI compilation system. This cell fits into the SCARCE RISC processor framework [ I ] , which is an architectural framework for automatic generation of application specific processors. By use of application analysis, the cell is tuned to the specijic requirements during silicon compilation time. This approach is new, in that it avoids the general performance penalty paid for requiredflexibility.
{"title":"A Scalable VLSI MIMD Routing Cell","authors":"H. Corporaal, J. Olk","doi":"10.1109/DMCC.1991.633351","DOIUrl":"https://doi.org/10.1109/DMCC.1991.633351","url":null,"abstract":"It is a well known fact that full custom designed computer architectures can achieve much higher performance for specific applications than general purpose computers. Thisperjormance has to be paidfor: a long design trajectory results in a high cost-performance ratio. Current VLSI design and compilation tools however, make semi-custom designs feasible with greatly reduced costs and time to market. This paper presents a scalable andflexible communication processor for message passing MIMD systems. This communication processor is implemented as a parametrisized VLSI routing cell in a VLSI compilation system. This cell fits into the SCARCE RISC processor framework [ I ] , which is an architectural framework for automatic generation of application specific processors. By use of application analysis, the cell is tuned to the specijic requirements during silicon compilation time. This approach is new, in that it avoids the general performance penalty paid for requiredflexibility.","PeriodicalId":313314,"journal":{"name":"The Sixth Distributed Memory Computing Conference, 1991. Proceedings","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134419702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1991-04-28DOI: 10.1109/DMCC.1991.633131
M. Haveraaen
We show that programs written for the SIMD machine model are equivalent to a special form of barrier MIMD programs. This form is called CPP. The CPP form is also produced when compiling functional languages like Crystal and Sapphire. CPP programs may be executed on MIMD computers without any need for global synchronization and little or no communication overhead, probably with a gain in execution speed as a result. This raises a challenge to construct MIMD computers with many processors and Eow-cost communication in order to ful ly utilize this potential.
{"title":"Comparing Some Approaches to Programming Distributed Memory Machines","authors":"M. Haveraaen","doi":"10.1109/DMCC.1991.633131","DOIUrl":"https://doi.org/10.1109/DMCC.1991.633131","url":null,"abstract":"We show that programs written for the SIMD machine model are equivalent to a special form of barrier MIMD programs. This form is called CPP. The CPP form is also produced when compiling functional languages like Crystal and Sapphire. CPP programs may be executed on MIMD computers without any need for global synchronization and little or no communication overhead, probably with a gain in execution speed as a result. This raises a challenge to construct MIMD computers with many processors and Eow-cost communication in order to ful ly utilize this potential.","PeriodicalId":313314,"journal":{"name":"The Sixth Distributed Memory Computing Conference, 1991. Proceedings","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133154836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}