Pub Date : 1992-03-01DOI: 10.1109/IPPS.1992.223067
Y. Ben-Asher, Aviad Cohen, A. Schuster, J. F. Sibeyn
Considers the problem of dynamic load balancing in an n processors parallel system. The authors focus on the algorithm which randomly assigns newly generated tasks to processors for execution. This process is modeled by randomly throwing weighted balls into n holes. For a given program A, the ball weights (task lengths) are chosen according to an unknown probability distribution D(A) with expectation mu , maximum M and minimum m. For any A, D(A) and a constant 0< in >
研究了n处理器并行系统的动态负载平衡问题。作者重点研究了将新生成的任务随机分配给处理器执行的算法。这个过程是通过将加权球随机扔进n个洞来模拟的。对于给定的程序a,球权(任务长度)根据未知概率分布D(a)选择,期望为mu,最大M和最小M。对于任意a, D(a)和常数0< in >
{"title":"The impact of task-length parameters on the performance of the random load-balancing algorithm","authors":"Y. Ben-Asher, Aviad Cohen, A. Schuster, J. F. Sibeyn","doi":"10.1109/IPPS.1992.223067","DOIUrl":"https://doi.org/10.1109/IPPS.1992.223067","url":null,"abstract":"Considers the problem of dynamic load balancing in an n processors parallel system. The authors focus on the algorithm which randomly assigns newly generated tasks to processors for execution. This process is modeled by randomly throwing weighted balls into n holes. For a given program A, the ball weights (task lengths) are chosen according to an unknown probability distribution D(A) with expectation mu , maximum M and minimum m. For any A, D(A) and a constant 0< in <or=0.5, they derive an upper bound on the number of processes which A needs to generate in order for the algorithm to achieve optimal load balancing with very high probability, so that the run-time is optimal up to a factor of (1+ in )/sup 2/. Using the relation derived, the programmer may control the load-balancing of his program by modifying the global parameters of the generated processes.<<ETX>>","PeriodicalId":340070,"journal":{"name":"Proceedings Sixth International Parallel Processing Symposium","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121262012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-03-01DOI: 10.1109/IPPS.1992.223061
H. Alnuweiri
A large number of the permutations realized by interconnection networks in parallel processing systems and digital arithmetic circuits, fall in the class of bit-permute-complement (BPC) permutations. The paper presents a methodology for routing this class of permutations in VLSI, under various I/O, area, and time trade-offs. The resulting VLSI designs can route a BPC permutation of size N, using a chip with N/Q I/O pins, O(N/sup 2//Q/sup 2/) area, and O(wQ) time, where w is the word length of the permuted elements and 1>
{"title":"Routing BPC permutations in VLSI","authors":"H. Alnuweiri","doi":"10.1109/IPPS.1992.223061","DOIUrl":"https://doi.org/10.1109/IPPS.1992.223061","url":null,"abstract":"A large number of the permutations realized by interconnection networks in parallel processing systems and digital arithmetic circuits, fall in the class of bit-permute-complement (BPC) permutations. The paper presents a methodology for routing this class of permutations in VLSI, under various I/O, area, and time trade-offs. The resulting VLSI designs can route a BPC permutation of size N, using a chip with N/Q I/O pins, O(N/sup 2//Q/sup 2/) area, and O(wQ) time, where w is the word length of the permuted elements and 1<or=Q<or= square root N/w.<<ETX>>","PeriodicalId":340070,"journal":{"name":"Proceedings Sixth International Parallel Processing Symposium","volume":"145 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122583200","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-03-01DOI: 10.1109/IPPS.1992.223031
Kyusun Choi, W. Adams
Despite the fact that a crossbar interconnection network is desirable in parallel processing systems due to its flexibility of configuration and simplicity of control, many of the crossbars developed up to this time are small in size. The paper presents the analysis of VLSI layout size and signal delay of the previous crossbar circuits. Also a circuit with better layout size and signal delay is presented in comparison. Based on the new circuit, the feasibility of the implementation is shown for a 256*256 crossbar on a 1cm/sup 2/ CMOS VLSI chip.<>
{"title":"VLSI implementation of a 256*256 crossbar interconnection network","authors":"Kyusun Choi, W. Adams","doi":"10.1109/IPPS.1992.223031","DOIUrl":"https://doi.org/10.1109/IPPS.1992.223031","url":null,"abstract":"Despite the fact that a crossbar interconnection network is desirable in parallel processing systems due to its flexibility of configuration and simplicity of control, many of the crossbars developed up to this time are small in size. The paper presents the analysis of VLSI layout size and signal delay of the previous crossbar circuits. Also a circuit with better layout size and signal delay is presented in comparison. Based on the new circuit, the feasibility of the implementation is shown for a 256*256 crossbar on a 1cm/sup 2/ CMOS VLSI chip.<<ETX>>","PeriodicalId":340070,"journal":{"name":"Proceedings Sixth International Parallel Processing Symposium","volume":"139 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132607843","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-03-01DOI: 10.1109/IPPS.1992.222963
Jenshiuh Liu, W. Hsu
A class of novel interconnection topologies called the generalized Fibonacci cubes is presented. The generalized Fibonacci cubes include the hypercubes, the recently proposed Fibonacci cubes (W.-J. Hsu, Proc. Int. Conf. on Parallel Processing, p.1722-3 (1991)), and some other asymmetric interconnection topologies bridging between the two mentioned above. The generalized Fibonacci cubes can serve as a framework for studying degraded hypercubes due to faulty nodes or links. Previously known algorithms for hypercubes do not generalize to this class of interconnection topologies. The authors present distributed routing and broadcasting algorithms that can be applied to all members of this class of interconnection topologies. It is shown that their distributed routing algorithm always finds a shortest and deadlock-free path. The broadcasting algorithms are designed and evaluated based on both the all-port and the 1-port communication models. The all-port broadcasting algorithm is provably optimal in terms of minimizing routing steps. An upper bound for the 1-port broadcasting algorithm is determined, which is shown to be optimal for certain cases.<>
{"title":"Distributed algorithms for shortest-path, deadlock-free routing and broadcasting in a class of interconnection topologies","authors":"Jenshiuh Liu, W. Hsu","doi":"10.1109/IPPS.1992.222963","DOIUrl":"https://doi.org/10.1109/IPPS.1992.222963","url":null,"abstract":"A class of novel interconnection topologies called the generalized Fibonacci cubes is presented. The generalized Fibonacci cubes include the hypercubes, the recently proposed Fibonacci cubes (W.-J. Hsu, Proc. Int. Conf. on Parallel Processing, p.1722-3 (1991)), and some other asymmetric interconnection topologies bridging between the two mentioned above. The generalized Fibonacci cubes can serve as a framework for studying degraded hypercubes due to faulty nodes or links. Previously known algorithms for hypercubes do not generalize to this class of interconnection topologies. The authors present distributed routing and broadcasting algorithms that can be applied to all members of this class of interconnection topologies. It is shown that their distributed routing algorithm always finds a shortest and deadlock-free path. The broadcasting algorithms are designed and evaluated based on both the all-port and the 1-port communication models. The all-port broadcasting algorithm is provably optimal in terms of minimizing routing steps. An upper bound for the 1-port broadcasting algorithm is determined, which is shown to be optimal for certain cases.<<ETX>>","PeriodicalId":340070,"journal":{"name":"Proceedings Sixth International Parallel Processing Symposium","volume":"130 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127343058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-03-01DOI: 10.1109/IPPS.1992.223051
A. Tayyab, J. G. Kuhl
Considers the problem of sequencing a set of parallel activities in the presence of nonzero overheads. Mechanisms for sequence control may range from explicit inter-task synchronization to more restrictive mechanisms such as blocking barriers. It is highly desirable to be able to base the choice of a sequence control mechanism for a parallel algorithm upon specific characteristics of the algorithm's structure and the underlying architecture. The paper presents approximate models for simple layered task systems that can predict overall performance and provide a useful understanding of key performance parameters and tradeoffs. The analytic results are compared with simulation to demonstrate their accuracy. Some simple applications of the model are presented that demonstrate non-intuitive behavior of layered graphs with inter-task versus barrier-based sequencing.<>
{"title":"Analyzing performance of sequencing mechanisms for simple layered task systems","authors":"A. Tayyab, J. G. Kuhl","doi":"10.1109/IPPS.1992.223051","DOIUrl":"https://doi.org/10.1109/IPPS.1992.223051","url":null,"abstract":"Considers the problem of sequencing a set of parallel activities in the presence of nonzero overheads. Mechanisms for sequence control may range from explicit inter-task synchronization to more restrictive mechanisms such as blocking barriers. It is highly desirable to be able to base the choice of a sequence control mechanism for a parallel algorithm upon specific characteristics of the algorithm's structure and the underlying architecture. The paper presents approximate models for simple layered task systems that can predict overall performance and provide a useful understanding of key performance parameters and tradeoffs. The analytic results are compared with simulation to demonstrate their accuracy. Some simple applications of the model are presented that demonstrate non-intuitive behavior of layered graphs with inter-task versus barrier-based sequencing.<<ETX>>","PeriodicalId":340070,"journal":{"name":"Proceedings Sixth International Parallel Processing Symposium","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114332577","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-03-01DOI: 10.1109/IPPS.1992.223071
G. Dijk, M. V. Gils
The process migration facility in the Eindhoven multiprocessor system (EMPS) is presented. In the EMPS system, mailboxes are used for interprocess communication. These mailboxes provide transparency of location for communicating processes. The major advantages of mailbox communication in the EMPS system are: (1) interprocess communication can proceed without losing messages; and (2) the communication paths can be updated very efficiently when a process moves to another processor. By redirecting the mailbox connection of the migrating process, the communication paths of all processes connected to the same mailbox are updated.<>
{"title":"Efficient process migration in the EMPS multiprocessor system","authors":"G. Dijk, M. V. Gils","doi":"10.1109/IPPS.1992.223071","DOIUrl":"https://doi.org/10.1109/IPPS.1992.223071","url":null,"abstract":"The process migration facility in the Eindhoven multiprocessor system (EMPS) is presented. In the EMPS system, mailboxes are used for interprocess communication. These mailboxes provide transparency of location for communicating processes. The major advantages of mailbox communication in the EMPS system are: (1) interprocess communication can proceed without losing messages; and (2) the communication paths can be updated very efficiently when a process moves to another processor. By redirecting the mailbox connection of the migrating process, the communication paths of all processes connected to the same mailbox are updated.<<ETX>>","PeriodicalId":340070,"journal":{"name":"Proceedings Sixth International Parallel Processing Symposium","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117025972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-03-01DOI: 10.1109/IPPS.1992.223033
Y. Wong, J. Delosme
The mapping of a systolic algorithm onto a regularly connected array architecture can be considered as a linear transformation problem. However, to derive the 'optimal' transformation is difficult because the necessary optimizations involve discrete decision variables and the cost functions do not usually have closed-form expressions. The paper considers the derivation of a space-optimal (minimum processor count) mapping of a given time performance. Utilizing some recent results from the geometry of numbers, it is shown that the solution space for this discrete optimization problem can be nicely bounded and hence, the optimal solution can be efficiently determined with enumeration for practical cases. Examples are provided to demonstrate the effectiveness of this approach.<>
{"title":"Space-optimal linear processor allocation for systolic arrays synthesis","authors":"Y. Wong, J. Delosme","doi":"10.1109/IPPS.1992.223033","DOIUrl":"https://doi.org/10.1109/IPPS.1992.223033","url":null,"abstract":"The mapping of a systolic algorithm onto a regularly connected array architecture can be considered as a linear transformation problem. However, to derive the 'optimal' transformation is difficult because the necessary optimizations involve discrete decision variables and the cost functions do not usually have closed-form expressions. The paper considers the derivation of a space-optimal (minimum processor count) mapping of a given time performance. Utilizing some recent results from the geometry of numbers, it is shown that the solution space for this discrete optimization problem can be nicely bounded and hence, the optimal solution can be efficiently determined with enumeration for practical cases. Examples are provided to demonstrate the effectiveness of this approach.<<ETX>>","PeriodicalId":340070,"journal":{"name":"Proceedings Sixth International Parallel Processing Symposium","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115068453","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-03-01DOI: 10.1109/IPPS.1992.222988
I. Yen, F. Bastani
The authors look at the performance and new collision resolution strategies for hash tables in massively parallel systems. The results show that using a hash table with linear probing yields O(logN) time performance for handling M accesses by N processors when the load factor of the table is 50%, where N is the size of the hash table. This is better than the performance of using sorted arrays. Two phase hashing gives an average time complexity O(logN) for M simultaneous accesses to a hash table of size N even when the table has 100% load. Simulation results also show that hypercube hashing significantly outperforms linear probing and double hashing.<>
{"title":"Hash table in massively parallel systems","authors":"I. Yen, F. Bastani","doi":"10.1109/IPPS.1992.222988","DOIUrl":"https://doi.org/10.1109/IPPS.1992.222988","url":null,"abstract":"The authors look at the performance and new collision resolution strategies for hash tables in massively parallel systems. The results show that using a hash table with linear probing yields O(logN) time performance for handling M accesses by N processors when the load factor of the table is 50%, where N is the size of the hash table. This is better than the performance of using sorted arrays. Two phase hashing gives an average time complexity O(logN) for M simultaneous accesses to a hash table of size N even when the table has 100% load. Simulation results also show that hypercube hashing significantly outperforms linear probing and double hashing.<<ETX>>","PeriodicalId":340070,"journal":{"name":"Proceedings Sixth International Parallel Processing Symposium","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123317800","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-03-01DOI: 10.1109/IPPS.1992.222997
P. Cull, S. Larson
The Mobius cubes are created, by rearranging in a systematic manner, some of the edges of a hypercube. This rearrangement results in smaller distances between processors; where distance is the number of communication links which must be traversed. The authors show that the n-dimensional Mobius cubes have a diameter of about n/2 and expected distance of about n/3. These distances are a considerable savings over the diameter of n and expected distance of n/2 for the n-dimensional hypercube. The authors show that the Mobius cubes have a slightly more complicated algorithm than the hypercube. While the asymmetry of the Mobius cubes may give rise to communications bottlenecks, they report preliminary experiments showing the bottle necks are not significant. They compare their Mobius cubes to other variants and indicate some advantages for the Mobius cubes.<>
{"title":"The 'Mobius cubes': improved cubelike networks for parallel computation","authors":"P. Cull, S. Larson","doi":"10.1109/IPPS.1992.222997","DOIUrl":"https://doi.org/10.1109/IPPS.1992.222997","url":null,"abstract":"The Mobius cubes are created, by rearranging in a systematic manner, some of the edges of a hypercube. This rearrangement results in smaller distances between processors; where distance is the number of communication links which must be traversed. The authors show that the n-dimensional Mobius cubes have a diameter of about n/2 and expected distance of about n/3. These distances are a considerable savings over the diameter of n and expected distance of n/2 for the n-dimensional hypercube. The authors show that the Mobius cubes have a slightly more complicated algorithm than the hypercube. While the asymmetry of the Mobius cubes may give rise to communications bottlenecks, they report preliminary experiments showing the bottle necks are not significant. They compare their Mobius cubes to other variants and indicate some advantages for the Mobius cubes.<<ETX>>","PeriodicalId":340070,"journal":{"name":"Proceedings Sixth International Parallel Processing Symposium","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125253978","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-03-01DOI: 10.1109/IPPS.1992.223017
Lal George, G. Lindstrom
Describes an effective means for programming shared memory multiprocessors whereby a set of sequential activities are linked together for execution in parallel. The glue for this linkage is provided by a functional language implemented via graph reduction and demand evaluation. The full power of functional programming is used to obtain succinct, high level specifications of parallel computations. The imperative procedures that constitute the sequential activities facilitate efficient utilization of individual processing elements, while the mechanisms inherent in graph reduction synchronize and schedule these activities.<>
{"title":"Using a functional language and graph reduction to program multiprocessor machines or functional control of imperative programs","authors":"Lal George, G. Lindstrom","doi":"10.1109/IPPS.1992.223017","DOIUrl":"https://doi.org/10.1109/IPPS.1992.223017","url":null,"abstract":"Describes an effective means for programming shared memory multiprocessors whereby a set of sequential activities are linked together for execution in parallel. The glue for this linkage is provided by a functional language implemented via graph reduction and demand evaluation. The full power of functional programming is used to obtain succinct, high level specifications of parallel computations. The imperative procedures that constitute the sequential activities facilitate efficient utilization of individual processing elements, while the mechanisms inherent in graph reduction synchronize and schedule these activities.<<ETX>>","PeriodicalId":340070,"journal":{"name":"Proceedings Sixth International Parallel Processing Symposium","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130008732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}