Pub Date : 1992-03-01DOI: 10.1109/IPPS.1992.222986
Eunmi Choi, M. Chung, Yunmo Chung
This paper compares and analyzes massively parallel SIMD architectures as processing environments for parallel logic simulation. The CM-2 and the MP-1 are considered as target machines for the comparison. Detailed contrasts between the two parallel schemes are made based on actual simulation results and system performance. Distributed event-driven simulation protocols are used to obtain experimental results for the two massively SIMD machines. According to the results, the MP-1 is 2 to 2.5 times faster than the CM-2 for up to 16 K gate benchmark circuits, while the CM-2 can accommodate circuits with a larger number of gates of processors. The presented comparisons and analysis of the two machines can be used to choose a SIMD machine for efficient parallel logic simulation.
{"title":"Comparisons and analysis of massively parallel SIMD architectures for parallel logic simulation","authors":"Eunmi Choi, M. Chung, Yunmo Chung","doi":"10.1109/IPPS.1992.222986","DOIUrl":"https://doi.org/10.1109/IPPS.1992.222986","url":null,"abstract":"This paper compares and analyzes massively parallel SIMD architectures as processing environments for parallel logic simulation. The CM-2 and the MP-1 are considered as target machines for the comparison. Detailed contrasts between the two parallel schemes are made based on actual simulation results and system performance. Distributed event-driven simulation protocols are used to obtain experimental results for the two massively SIMD machines. According to the results, the MP-1 is 2 to 2.5 times faster than the CM-2 for up to 16 K gate benchmark circuits, while the CM-2 can accommodate circuits with a larger number of gates of processors. The presented comparisons and analysis of the two machines can be used to choose a SIMD machine for efficient parallel logic simulation.","PeriodicalId":340070,"journal":{"name":"Proceedings Sixth International Parallel Processing Symposium","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123305251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-03-01DOI: 10.1109/IPPS.1992.223063
R. Sarnath, Xin He
Parallel algorithms for more general versions of the well known selection and searching problems are formulated. The authors look at these problems when the set of elements can be represented as an n*n matrix with sorted rows and columns. The selection algorithm takes O(lognloglogn log* n) time with O(n/log nlog* n) processors on an EREW PRAM. The searching algorithm takes O(loglogn) time with O(n/loglogn) processors on a CREW PRAM, which is optimal. The authors also show that no algorithm using at most n log/sup c/ n processors, c>or=1, can solve the matrix search problem in time faster than Omega (log log n).<>
{"title":"Efficient parallel algorithms for selection and searching on sorted matrices","authors":"R. Sarnath, Xin He","doi":"10.1109/IPPS.1992.223063","DOIUrl":"https://doi.org/10.1109/IPPS.1992.223063","url":null,"abstract":"Parallel algorithms for more general versions of the well known selection and searching problems are formulated. The authors look at these problems when the set of elements can be represented as an n*n matrix with sorted rows and columns. The selection algorithm takes O(lognloglogn log* n) time with O(n/log nlog* n) processors on an EREW PRAM. The searching algorithm takes O(loglogn) time with O(n/loglogn) processors on a CREW PRAM, which is optimal. The authors also show that no algorithm using at most n log/sup c/ n processors, c>or=1, can solve the matrix search problem in time faster than Omega (log log n).<<ETX>>","PeriodicalId":340070,"journal":{"name":"Proceedings Sixth International Parallel Processing Symposium","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131618988","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The authors study routing problems in a general class of asymmetrical three-stage Clos networks. This class covers many asymmetrical three-stage networks considered by earlier researchers. They derive necessary and sufficient conditions under which this class of networks is rearrangeable with respect to a set of multiconnections, that is, connections where the paired entities are not limited to single terminals but can be arbitrary subsets of the terminals. They model the routing problem in these networks as a network-flow problem. If the number of switching elements in the first and last stages of the network is O(f) and the number of switching elements in the middle stage is m, then the network-flow model yields a routing algorithm with running time O(mf/sup 3/).<>
{"title":"Asymmetrical multiconnection three-stage Clos networks","authors":"A. Varma, S. Chalasani","doi":"10.1002/net.3230230423","DOIUrl":"https://doi.org/10.1002/net.3230230423","url":null,"abstract":"The authors study routing problems in a general class of asymmetrical three-stage Clos networks. This class covers many asymmetrical three-stage networks considered by earlier researchers. They derive necessary and sufficient conditions under which this class of networks is rearrangeable with respect to a set of multiconnections, that is, connections where the paired entities are not limited to single terminals but can be arbitrary subsets of the terminals. They model the routing problem in these networks as a network-flow problem. If the number of switching elements in the first and last stages of the network is O(f) and the number of switching elements in the middle stage is m, then the network-flow model yields a routing algorithm with running time O(mf/sup 3/).<<ETX>>","PeriodicalId":340070,"journal":{"name":"Proceedings Sixth International Parallel Processing Symposium","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125043569","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-03-01DOI: 10.1109/IPPS.1992.222992
K. Rokusawa, N. Ichiyoshi
This paper proposes a scheme for changing the execution state of a pool of processes in a distributed environment where there may be processes in transit. The scheme can detect the completion of state change using weighted throw counting and detect the termination as well. It works whether the communication channels are synchronous or asynchronous, FIFO or non-FIFO. The message complexity of the scheme is typically O(number of processing elements).<>
{"title":"A scheme for state change in a distributed environment using weighted throw counting","authors":"K. Rokusawa, N. Ichiyoshi","doi":"10.1109/IPPS.1992.222992","DOIUrl":"https://doi.org/10.1109/IPPS.1992.222992","url":null,"abstract":"This paper proposes a scheme for changing the execution state of a pool of processes in a distributed environment where there may be processes in transit. The scheme can detect the completion of state change using weighted throw counting and detect the termination as well. It works whether the communication channels are synchronous or asynchronous, FIFO or non-FIFO. The message complexity of the scheme is typically O(number of processing elements).<<ETX>>","PeriodicalId":340070,"journal":{"name":"Proceedings Sixth International Parallel Processing Symposium","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134351561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-03-01DOI: 10.1109/IPPS.1992.222973
Magali E. Azema-Barac
This paper describes a framework for implementing neural networks on massively parallel machines. The framework is generic and applies to a range of neural networks (Multi Layer Perceptron, Competitive Learning, Self-Organising Map, etc.) as well as a range of massively parallel machines (Connection Machine, Distributed Array Processor, MasPar). It consists of two phases: an abstract decomposition of neural networks and a machine specific decomposition. The abstract decomposition identifies the parallelism implemented by neural networks, and provides alternative distribution schemes according to the required exploitation of parallelism. The machine specific decomposition considers the relevant machine criteria, and integrates these with the result of the abstract decomposition to form a 'decision' system. This system formalises the relative gain of each distribution scheme according to neural network and machine criteria. It then identifies their possible optimisations. Finally, it computes and ranks the absolute speed up of each distribution scheme.<>
{"title":"A conceptual framework for implementing neural networks on massively parallel machines","authors":"Magali E. Azema-Barac","doi":"10.1109/IPPS.1992.222973","DOIUrl":"https://doi.org/10.1109/IPPS.1992.222973","url":null,"abstract":"This paper describes a framework for implementing neural networks on massively parallel machines. The framework is generic and applies to a range of neural networks (Multi Layer Perceptron, Competitive Learning, Self-Organising Map, etc.) as well as a range of massively parallel machines (Connection Machine, Distributed Array Processor, MasPar). It consists of two phases: an abstract decomposition of neural networks and a machine specific decomposition. The abstract decomposition identifies the parallelism implemented by neural networks, and provides alternative distribution schemes according to the required exploitation of parallelism. The machine specific decomposition considers the relevant machine criteria, and integrates these with the result of the abstract decomposition to form a 'decision' system. This system formalises the relative gain of each distribution scheme according to neural network and machine criteria. It then identifies their possible optimisations. Finally, it computes and ranks the absolute speed up of each distribution scheme.<<ETX>>","PeriodicalId":340070,"journal":{"name":"Proceedings Sixth International Parallel Processing Symposium","volume":"97 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133600571","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-03-01DOI: 10.1109/IPPS.1992.222978
G. Jennings
The author proposes a new execution model for a non-dataflow tagged-token architecture which is not Petri-net based but rather more closely related to the lambda calculus. The model exploits a functional programming style having applicative-order evaluation. The computation's execution graph is dynamically generated according to easily understood dynamic tagging rules which have been demonstrated to be implementable. The model permits conceptually unbounded parallelism for an interesting class of list-oriented computations. The author explains the model with the help of a simple dot-product computation as an example. He highlights some of the major differences between the dataflow paradigm and his own. Architectural issues toward implementation are briefly discussed.<>
{"title":"A functional execution model for a non-dataflow tagged token architecture","authors":"G. Jennings","doi":"10.1109/IPPS.1992.222978","DOIUrl":"https://doi.org/10.1109/IPPS.1992.222978","url":null,"abstract":"The author proposes a new execution model for a non-dataflow tagged-token architecture which is not Petri-net based but rather more closely related to the lambda calculus. The model exploits a functional programming style having applicative-order evaluation. The computation's execution graph is dynamically generated according to easily understood dynamic tagging rules which have been demonstrated to be implementable. The model permits conceptually unbounded parallelism for an interesting class of list-oriented computations. The author explains the model with the help of a simple dot-product computation as an example. He highlights some of the major differences between the dataflow paradigm and his own. Architectural issues toward implementation are briefly discussed.<<ETX>>","PeriodicalId":340070,"journal":{"name":"Proceedings Sixth International Parallel Processing Symposium","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115381561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-03-01DOI: 10.1109/IPPS.1992.222974
E. Haddad
Nonreplicated shared data of distributed applications is optimally allocated to pre-specified multilevel memory partitions at the sites of a heterogeneous multicomputer network to minimize a weighted combination of systemwide mean time delay performance and mean communication cost per access request. Greedy and fast optimization algorithms are presented for nonqueueing lightly-loaded as well as heavily-loaded multiqueue system models with channel, l/O, and memory hierarchy queues. Extensions to data exhibiting nonuniform access demand rates and distinct query and update statistics are presented.<>
{"title":"Optimal allocation of shared data over distributed memory hierarchies","authors":"E. Haddad","doi":"10.1109/IPPS.1992.222974","DOIUrl":"https://doi.org/10.1109/IPPS.1992.222974","url":null,"abstract":"Nonreplicated shared data of distributed applications is optimally allocated to pre-specified multilevel memory partitions at the sites of a heterogeneous multicomputer network to minimize a weighted combination of systemwide mean time delay performance and mean communication cost per access request. Greedy and fast optimization algorithms are presented for nonqueueing lightly-loaded as well as heavily-loaded multiqueue system models with channel, l/O, and memory hierarchy queues. Extensions to data exhibiting nonuniform access demand rates and distinct query and update statistics are presented.<<ETX>>","PeriodicalId":340070,"journal":{"name":"Proceedings Sixth International Parallel Processing Symposium","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116684505","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-03-01DOI: 10.1109/IPPS.1992.223056
Y. Ben-Asher, A. Schuster
Reconfigurable networks have attracted increased attention recently, as an extremely strong parallel model which is realizable in hardware. The authors consider the basic problem of gathering information which is dispersed among the nodes of the network. They analyze the complexity of the problem on reconfigurable linear-arrays. The analysis introduces a novel criteria for the efficiency of reconfigurable network algorithms, namely the bus-usage. The bus-usage quantity measures the utilization of the network sub-buses by the algorithm. It is shown how this yields bounds on the algorithm run-time, by deriving a run-time to bus-usage trade-off.<>
{"title":"The bus-usage method for the analysis of reconfiguring networks algorithms","authors":"Y. Ben-Asher, A. Schuster","doi":"10.1109/IPPS.1992.223056","DOIUrl":"https://doi.org/10.1109/IPPS.1992.223056","url":null,"abstract":"Reconfigurable networks have attracted increased attention recently, as an extremely strong parallel model which is realizable in hardware. The authors consider the basic problem of gathering information which is dispersed among the nodes of the network. They analyze the complexity of the problem on reconfigurable linear-arrays. The analysis introduces a novel criteria for the efficiency of reconfigurable network algorithms, namely the bus-usage. The bus-usage quantity measures the utilization of the network sub-buses by the algorithm. It is shown how this yields bounds on the algorithm run-time, by deriving a run-time to bus-usage trade-off.<<ETX>>","PeriodicalId":340070,"journal":{"name":"Proceedings Sixth International Parallel Processing Symposium","volume":"106 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115769957","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-03-01DOI: 10.1109/IPPS.1992.223074
Y. Maa, D. Pradhan, D. Thiébaut
Cache coherence problem is a major design issue for shared-memory multiprocessors. As the system size scales, traditional bus-based snoopy cache coherence schemes are no longer adequate. Instead, the directory-based scheme is a promising approach to deal with the large-scale cache coherence problem. However, the storage overhead of directory schemes often becomes too prohibitive as the system size increases. The paper proposes the hierarchical full-map directory to reduce the storage requirement while still achieving satisfactory performance. The key point is to exploit the inherent geographical interprocessor locality among shared data in the parallel programs. Trace-driven evaluations show that the performance of the proposed scheme compares competitively to the full-map directory scheme, while reducing the storage overhead by over 90%. The proposed hierarchical full-map directory scheme seems to be a promising hardware approach for handling cache coherence in the design of future large-scale multiprocessor memory systems.<>
{"title":"A hierarchical directory scheme for large-scale cache-coherent multiprocessors","authors":"Y. Maa, D. Pradhan, D. Thiébaut","doi":"10.1109/IPPS.1992.223074","DOIUrl":"https://doi.org/10.1109/IPPS.1992.223074","url":null,"abstract":"Cache coherence problem is a major design issue for shared-memory multiprocessors. As the system size scales, traditional bus-based snoopy cache coherence schemes are no longer adequate. Instead, the directory-based scheme is a promising approach to deal with the large-scale cache coherence problem. However, the storage overhead of directory schemes often becomes too prohibitive as the system size increases. The paper proposes the hierarchical full-map directory to reduce the storage requirement while still achieving satisfactory performance. The key point is to exploit the inherent geographical interprocessor locality among shared data in the parallel programs. Trace-driven evaluations show that the performance of the proposed scheme compares competitively to the full-map directory scheme, while reducing the storage overhead by over 90%. The proposed hierarchical full-map directory scheme seems to be a promising hardware approach for handling cache coherence in the design of future large-scale multiprocessor memory systems.<<ETX>>","PeriodicalId":340070,"journal":{"name":"Proceedings Sixth International Parallel Processing Symposium","volume":"105 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124761687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-03-01DOI: 10.1109/IPPS.1992.223025
J. Jenq, S. Sahni
The authors develop an O(n/sup 2/) time serial algorithm to obtain the medial axis transform (MAT) of an n*n image. An O(logn) time CREW PRAM algorithm and an O(log/sup 2/n) time SIMD hypercube parallel algorithm for the MAT are also developed. Both of these use O(n/sup 2/) processors. Two problems associated with the MAT are also studied. These are the area and perimeter reporting problem. The authors develop an O(logn) time hypercube algorithm for both of these problems. Here n is the number of squares in the MAT and the algorithms use O(n/sup 2/) processors.<>
{"title":"Serial and parallel algorithms for the medial axis transform","authors":"J. Jenq, S. Sahni","doi":"10.1109/IPPS.1992.223025","DOIUrl":"https://doi.org/10.1109/IPPS.1992.223025","url":null,"abstract":"The authors develop an O(n/sup 2/) time serial algorithm to obtain the medial axis transform (MAT) of an n*n image. An O(logn) time CREW PRAM algorithm and an O(log/sup 2/n) time SIMD hypercube parallel algorithm for the MAT are also developed. Both of these use O(n/sup 2/) processors. Two problems associated with the MAT are also studied. These are the area and perimeter reporting problem. The authors develop an O(logn) time hypercube algorithm for both of these problems. Here n is the number of squares in the MAT and the algorithms use O(n/sup 2/) processors.<<ETX>>","PeriodicalId":340070,"journal":{"name":"Proceedings Sixth International Parallel Processing Symposium","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123860947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}