Pub Date : 1993-04-13DOI: 10.1109/IPPS.1993.262859
R. K. Thiruchelvan, J. Trahan, R. Vaidyanathan
The authors investigate communication among synchronous parallel processors through a new model of parallel computation called the reconfigurable multiple bus machine (RMBM). Four versions of the RMBM are introduced. In these models processors communicate over buses and possess varying abilities to segment and/or fuse buses. A hierarchy of the versions of the RMBM and the PRAM based on their relative 'powers' is established, indicating the relative contribution of segmenting and fusing buses.<>
{"title":"On the power of segmenting and fusing buses","authors":"R. K. Thiruchelvan, J. Trahan, R. Vaidyanathan","doi":"10.1109/IPPS.1993.262859","DOIUrl":"https://doi.org/10.1109/IPPS.1993.262859","url":null,"abstract":"The authors investigate communication among synchronous parallel processors through a new model of parallel computation called the reconfigurable multiple bus machine (RMBM). Four versions of the RMBM are introduced. In these models processors communicate over buses and possess varying abilities to segment and/or fuse buses. A hierarchy of the versions of the RMBM and the PRAM based on their relative 'powers' is established, indicating the relative contribution of segmenting and fusing buses.<<ETX>>","PeriodicalId":248927,"journal":{"name":"[1993] Proceedings Seventh International Parallel Processing Symposium","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1993-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122539199","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1993-04-13DOI: 10.1109/IPPS.1993.262853
John Lenell, N. Bagherzadeh
This paper quantitatively compares various superscalar processor architectures with a very long instruction word architecture developed at the University of California, Irvine. The motivation for this comparison is to study the capability of a dynamically scheduled processor to obtain the same performance achieved by a statically scheduled processor, and examine the hardware resources required by each.<>
{"title":"A performance comparison of several superscalar processor models with a VLIW processor","authors":"John Lenell, N. Bagherzadeh","doi":"10.1109/IPPS.1993.262853","DOIUrl":"https://doi.org/10.1109/IPPS.1993.262853","url":null,"abstract":"This paper quantitatively compares various superscalar processor architectures with a very long instruction word architecture developed at the University of California, Irvine. The motivation for this comparison is to study the capability of a dynamically scheduled processor to obtain the same performance achieved by a statically scheduled processor, and examine the hardware resources required by each.<<ETX>>","PeriodicalId":248927,"journal":{"name":"[1993] Proceedings Seventh International Parallel Processing Symposium","volume":"127 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1993-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115190288","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1993-04-13DOI: 10.1109/IPPS.1993.262779
S. Dutt, N. Mahapatra
The authors develop parallel A* algorithms suitable for distributed-memory machines. In parallel A* algorithms, inefficiencies grow with the number of processors P used, causing performance to drop significantly at lower and intermediate work densities (the ratio of the problem size to P). To alleviate this effect, they propose a novel parallel startup phase and efficient dynamic work distribution strategies and thus improve the scalability of parallel A* search. They also tackle the problem of duplicate searching by different processors, by using work transfer as a means to partial duplicate pruning. The parallel startup scheme proposed requires only Theta (log P) time compared to Theta (P) time for sequential startup methods used in the past. Using the traveling salesman problem (TSP) as the test case, the work distribution strategies yield speedup improvements of more than 30% and 15% at lower and intermediate work densities, respectively, while requiring 20% to 45% less memory, compared to previous approaches. Moreover, the simple duplicate pruning scheme provides an average reduction of 20% in execution time for up to 64 processors, compared to previous approaches that do not prune any duplicates.<>
{"title":"Parallel A* algorithms and their performance on hypercube multiprocessors","authors":"S. Dutt, N. Mahapatra","doi":"10.1109/IPPS.1993.262779","DOIUrl":"https://doi.org/10.1109/IPPS.1993.262779","url":null,"abstract":"The authors develop parallel A* algorithms suitable for distributed-memory machines. In parallel A* algorithms, inefficiencies grow with the number of processors P used, causing performance to drop significantly at lower and intermediate work densities (the ratio of the problem size to P). To alleviate this effect, they propose a novel parallel startup phase and efficient dynamic work distribution strategies and thus improve the scalability of parallel A* search. They also tackle the problem of duplicate searching by different processors, by using work transfer as a means to partial duplicate pruning. The parallel startup scheme proposed requires only Theta (log P) time compared to Theta (P) time for sequential startup methods used in the past. Using the traveling salesman problem (TSP) as the test case, the work distribution strategies yield speedup improvements of more than 30% and 15% at lower and intermediate work densities, respectively, while requiring 20% to 45% less memory, compared to previous approaches. Moreover, the simple duplicate pruning scheme provides an average reduction of 20% in execution time for up to 64 processors, compared to previous approaches that do not prune any duplicates.<<ETX>>","PeriodicalId":248927,"journal":{"name":"[1993] Proceedings Seventh International Parallel Processing Symposium","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1993-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115537764","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1993-04-13DOI: 10.1109/IPPS.1993.262902
Suresh Singh, M. Sridhar
The authors present an algorithm that produces a schedule of transmissions to achieve gossiping in a system where nodes are equipped with radio transceivers; different nodes have transceivers with different ranges. They assume that all the nodes are placed on a line and that all communication occurs in one dimension. Finally, unlike traditional system models for gossiping, they assume that all nodes within range can hear a transmission and that simultaneous transmissions may cause information loss via collisions. The gossiping algorithm developed decomposes a system configuration into a spine tree and uses this structure to recursively produce a transmission schedule.<>
{"title":"Gossiping on interval graphs (computer networks)","authors":"Suresh Singh, M. Sridhar","doi":"10.1109/IPPS.1993.262902","DOIUrl":"https://doi.org/10.1109/IPPS.1993.262902","url":null,"abstract":"The authors present an algorithm that produces a schedule of transmissions to achieve gossiping in a system where nodes are equipped with radio transceivers; different nodes have transceivers with different ranges. They assume that all the nodes are placed on a line and that all communication occurs in one dimension. Finally, unlike traditional system models for gossiping, they assume that all nodes within range can hear a transmission and that simultaneous transmissions may cause information loss via collisions. The gossiping algorithm developed decomposes a system configuration into a spine tree and uses this structure to recursively produce a transmission schedule.<<ETX>>","PeriodicalId":248927,"journal":{"name":"[1993] Proceedings Seventh International Parallel Processing Symposium","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1993-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114766468","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1993-04-13DOI: 10.1109/IPPS.1993.262875
Gebre A. Gessesse, S. Chalasani
Two-dimensional tori, or its variants such as the midimew networks, are the most popular degree-four interconnection networks. However, the number of nodes interconnected by two-dimensional tori or the midimew networks grows as a square of their diameters. The authors discuss two different types of degree-four interconnection networks, the starcake networks and the k-ary 2-cliques. These graphs are regular, vertex-symmetric, maximally fault-tolerant and have a better diameter than the popular degree-four networks. They discuss the construction and routing of these networks and compare them with other interconnection networks. A preliminary performance comparison indicates that the proposed networks offer better throughput-delay characteristics than tori and midimew networks.<>
{"title":"New degree four networks: properties and performance","authors":"Gebre A. Gessesse, S. Chalasani","doi":"10.1109/IPPS.1993.262875","DOIUrl":"https://doi.org/10.1109/IPPS.1993.262875","url":null,"abstract":"Two-dimensional tori, or its variants such as the midimew networks, are the most popular degree-four interconnection networks. However, the number of nodes interconnected by two-dimensional tori or the midimew networks grows as a square of their diameters. The authors discuss two different types of degree-four interconnection networks, the starcake networks and the k-ary 2-cliques. These graphs are regular, vertex-symmetric, maximally fault-tolerant and have a better diameter than the popular degree-four networks. They discuss the construction and routing of these networks and compare them with other interconnection networks. A preliminary performance comparison indicates that the proposed networks offer better throughput-delay characteristics than tori and midimew networks.<<ETX>>","PeriodicalId":248927,"journal":{"name":"[1993] Proceedings Seventh International Parallel Processing Symposium","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1993-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114728208","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1993-04-13DOI: 10.1109/IPPS.1993.262782
A. Cheng
When rule-based expert systems are used to monitor and control real-time systems, the ability of these expert systems to met stringent response time constraints is as important as their ability to produce correct results to react to input. This paper explores parallel execution as an approach to achieve higher execution speed in rule-based systems in domains requiring high performance and real-time response. In particular, it shows how rule-firing parallelism can be automatically extracted from a real-time rule-based system via static analysis of the system source code. To demonstrate the practicality of this approach, the proposed technique is applied to reduce the execution time of two NASA expert systems.<>
{"title":"Parallel execution of real-time rule-based systems","authors":"A. Cheng","doi":"10.1109/IPPS.1993.262782","DOIUrl":"https://doi.org/10.1109/IPPS.1993.262782","url":null,"abstract":"When rule-based expert systems are used to monitor and control real-time systems, the ability of these expert systems to met stringent response time constraints is as important as their ability to produce correct results to react to input. This paper explores parallel execution as an approach to achieve higher execution speed in rule-based systems in domains requiring high performance and real-time response. In particular, it shows how rule-firing parallelism can be automatically extracted from a real-time rule-based system via static analysis of the system source code. To demonstrate the practicality of this approach, the proposed technique is applied to reduce the execution time of two NASA expert systems.<<ETX>>","PeriodicalId":248927,"journal":{"name":"[1993] Proceedings Seventh International Parallel Processing Symposium","volume":"11 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1993-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120909721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1993-04-13DOI: 10.1109/IPPS.1993.262781
Garng M. Huang, W. Ongsakul
The parallelization and implementations of Gauss-Seidel (G-S) algorithms for power flow analysis have been investigated on a Sequent Balance shared memory (SM) machine. In this paper, the authors generalize the idea to more general computer architectures and demonstrate how to effectively increase the speedup upper bounds of G-S algorithms by properly managing the bottlenecks.<>
{"title":"Managing the bottlenecks of a parallel Gauss-Seidel algorithm for power flow analysis","authors":"Garng M. Huang, W. Ongsakul","doi":"10.1109/IPPS.1993.262781","DOIUrl":"https://doi.org/10.1109/IPPS.1993.262781","url":null,"abstract":"The parallelization and implementations of Gauss-Seidel (G-S) algorithms for power flow analysis have been investigated on a Sequent Balance shared memory (SM) machine. In this paper, the authors generalize the idea to more general computer architectures and demonstrate how to effectively increase the speedup upper bounds of G-S algorithms by properly managing the bottlenecks.<<ETX>>","PeriodicalId":248927,"journal":{"name":"[1993] Proceedings Seventh International Parallel Processing Symposium","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1993-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115209841","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1993-04-13DOI: 10.1109/IPPS.1993.262825
I. Scherson, Chi-Kai Chien
Least common ancestor networks (LCANs) are introduced and shown to be a class of networks that include fat-trees, baseline networks, SW-banyans and the router networks of the TRAC 1.1 and 2.0, and the CM-5. Some LCAN properties are stated and the permutation routing capabilities of an important subclass are analyzed. Simulation results for three permutation classes verify the accuracy of an iterative solution for a randomized routing strategy.<>
{"title":"Least common ancestor networks","authors":"I. Scherson, Chi-Kai Chien","doi":"10.1109/IPPS.1993.262825","DOIUrl":"https://doi.org/10.1109/IPPS.1993.262825","url":null,"abstract":"Least common ancestor networks (LCANs) are introduced and shown to be a class of networks that include fat-trees, baseline networks, SW-banyans and the router networks of the TRAC 1.1 and 2.0, and the CM-5. Some LCAN properties are stated and the permutation routing capabilities of an important subclass are analyzed. Simulation results for three permutation classes verify the accuracy of an iterative solution for a randomized routing strategy.<<ETX>>","PeriodicalId":248927,"journal":{"name":"[1993] Proceedings Seventh International Parallel Processing Symposium","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1993-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122252938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1993-04-13DOI: 10.1109/IPPS.1993.262791
D. Acton, G. Neufeld
This paper presents the concurrency features found in Raven, an object-oriented parallel programming system. Raven supports coarse-grained parallelism via class based and user based parallelism. Class based parallelism is provided by the implementor of the class, while user based parallelism is provided by the user, or client of objects. Raven also supports object properties which are determined at object creation time, thereby eliminating the need for separate class hierarchies that support concurrency. Raven is operational on a variety of machine architectures, including a shared memory multiprocessor. Initial experience indicates that sequential code can easily be transformed into parallel code and that a substantial speedup is possible.<>
{"title":"Class and user based parallelism in Raven","authors":"D. Acton, G. Neufeld","doi":"10.1109/IPPS.1993.262791","DOIUrl":"https://doi.org/10.1109/IPPS.1993.262791","url":null,"abstract":"This paper presents the concurrency features found in Raven, an object-oriented parallel programming system. Raven supports coarse-grained parallelism via class based and user based parallelism. Class based parallelism is provided by the implementor of the class, while user based parallelism is provided by the user, or client of objects. Raven also supports object properties which are determined at object creation time, thereby eliminating the need for separate class hierarchies that support concurrency. Raven is operational on a variety of machine architectures, including a shared memory multiprocessor. Initial experience indicates that sequential code can easily be transformed into parallel code and that a substantial speedup is possible.<<ETX>>","PeriodicalId":248927,"journal":{"name":"[1993] Proceedings Seventh International Parallel Processing Symposium","volume":"2 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1993-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131605925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1993-04-13DOI: 10.1109/IPPS.1993.262870
Sizheng Wei, E. Schenfeld
The hierarchical interconnection cache network (HICN) is a novel network architecture for massively parallel processing systems. The HICN's topology is a hierarchy of multiple, three-stage interconnection cache networks. The first and third stages of each network use small, fast crossbar switches. Large, slow switching (reconfigurable) crossbars are used in the middle stages. HICN exploits a special kind of communication locality, called switching locality, offering greater flexibility and lower latency compared with the classical hierarchical networks. HICN uses small size switches for the communication routing and large size switches for setting up the network (reconfiguration) to match as close as possible the expected communication pattern. The trade-off between the routing speed and the switch size is one major factor of achieving high speed communication in massively parallel interconnection networks. The authors present efficient embeddings of several classical network topologies, such as hypercubes, complete binary trees, and grids, into HICNs. They also show that HICNs are flexibly partitionable.<>
{"title":"Hierarchical interconnection cache networks","authors":"Sizheng Wei, E. Schenfeld","doi":"10.1109/IPPS.1993.262870","DOIUrl":"https://doi.org/10.1109/IPPS.1993.262870","url":null,"abstract":"The hierarchical interconnection cache network (HICN) is a novel network architecture for massively parallel processing systems. The HICN's topology is a hierarchy of multiple, three-stage interconnection cache networks. The first and third stages of each network use small, fast crossbar switches. Large, slow switching (reconfigurable) crossbars are used in the middle stages. HICN exploits a special kind of communication locality, called switching locality, offering greater flexibility and lower latency compared with the classical hierarchical networks. HICN uses small size switches for the communication routing and large size switches for setting up the network (reconfiguration) to match as close as possible the expected communication pattern. The trade-off between the routing speed and the switch size is one major factor of achieving high speed communication in massively parallel interconnection networks. The authors present efficient embeddings of several classical network topologies, such as hypercubes, complete binary trees, and grids, into HICNs. They also show that HICNs are flexibly partitionable.<<ETX>>","PeriodicalId":248927,"journal":{"name":"[1993] Proceedings Seventh International Parallel Processing Symposium","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1993-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133645998","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}