Pub Date : 1992-03-01DOI: 10.1109/IPPS.1992.222986
Eunmi Choi, M. Chung, Yunmo Chung
This paper compares and analyzes massively parallel SIMD architectures as processing environments for parallel logic simulation. The CM-2 and the MP-1 are considered as target machines for the comparison. Detailed contrasts between the two parallel schemes are made based on actual simulation results and system performance. Distributed event-driven simulation protocols are used to obtain experimental results for the two massively SIMD machines. According to the results, the MP-1 is 2 to 2.5 times faster than the CM-2 for up to 16 K gate benchmark circuits, while the CM-2 can accommodate circuits with a larger number of gates of processors. The presented comparisons and analysis of the two machines can be used to choose a SIMD machine for efficient parallel logic simulation.
{"title":"Comparisons and analysis of massively parallel SIMD architectures for parallel logic simulation","authors":"Eunmi Choi, M. Chung, Yunmo Chung","doi":"10.1109/IPPS.1992.222986","DOIUrl":"https://doi.org/10.1109/IPPS.1992.222986","url":null,"abstract":"This paper compares and analyzes massively parallel SIMD architectures as processing environments for parallel logic simulation. The CM-2 and the MP-1 are considered as target machines for the comparison. Detailed contrasts between the two parallel schemes are made based on actual simulation results and system performance. Distributed event-driven simulation protocols are used to obtain experimental results for the two massively SIMD machines. According to the results, the MP-1 is 2 to 2.5 times faster than the CM-2 for up to 16 K gate benchmark circuits, while the CM-2 can accommodate circuits with a larger number of gates of processors. The presented comparisons and analysis of the two machines can be used to choose a SIMD machine for efficient parallel logic simulation.","PeriodicalId":340070,"journal":{"name":"Proceedings Sixth International Parallel Processing Symposium","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123305251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-03-01DOI: 10.1109/IPPS.1992.223063
R. Sarnath, Xin He
Parallel algorithms for more general versions of the well known selection and searching problems are formulated. The authors look at these problems when the set of elements can be represented as an n*n matrix with sorted rows and columns. The selection algorithm takes O(lognloglogn log* n) time with O(n/log nlog* n) processors on an EREW PRAM. The searching algorithm takes O(loglogn) time with O(n/loglogn) processors on a CREW PRAM, which is optimal. The authors also show that no algorithm using at most n log/sup c/ n processors, c>or=1, can solve the matrix search problem in time faster than Omega (log log n).<>
{"title":"Efficient parallel algorithms for selection and searching on sorted matrices","authors":"R. Sarnath, Xin He","doi":"10.1109/IPPS.1992.223063","DOIUrl":"https://doi.org/10.1109/IPPS.1992.223063","url":null,"abstract":"Parallel algorithms for more general versions of the well known selection and searching problems are formulated. The authors look at these problems when the set of elements can be represented as an n*n matrix with sorted rows and columns. The selection algorithm takes O(lognloglogn log* n) time with O(n/log nlog* n) processors on an EREW PRAM. The searching algorithm takes O(loglogn) time with O(n/loglogn) processors on a CREW PRAM, which is optimal. The authors also show that no algorithm using at most n log/sup c/ n processors, c>or=1, can solve the matrix search problem in time faster than Omega (log log n).<<ETX>>","PeriodicalId":340070,"journal":{"name":"Proceedings Sixth International Parallel Processing Symposium","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131618988","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-03-01DOI: 10.1109/IPPS.1992.222978
G. Jennings
The author proposes a new execution model for a non-dataflow tagged-token architecture which is not Petri-net based but rather more closely related to the lambda calculus. The model exploits a functional programming style having applicative-order evaluation. The computation's execution graph is dynamically generated according to easily understood dynamic tagging rules which have been demonstrated to be implementable. The model permits conceptually unbounded parallelism for an interesting class of list-oriented computations. The author explains the model with the help of a simple dot-product computation as an example. He highlights some of the major differences between the dataflow paradigm and his own. Architectural issues toward implementation are briefly discussed.<>
{"title":"A functional execution model for a non-dataflow tagged token architecture","authors":"G. Jennings","doi":"10.1109/IPPS.1992.222978","DOIUrl":"https://doi.org/10.1109/IPPS.1992.222978","url":null,"abstract":"The author proposes a new execution model for a non-dataflow tagged-token architecture which is not Petri-net based but rather more closely related to the lambda calculus. The model exploits a functional programming style having applicative-order evaluation. The computation's execution graph is dynamically generated according to easily understood dynamic tagging rules which have been demonstrated to be implementable. The model permits conceptually unbounded parallelism for an interesting class of list-oriented computations. The author explains the model with the help of a simple dot-product computation as an example. He highlights some of the major differences between the dataflow paradigm and his own. Architectural issues toward implementation are briefly discussed.<<ETX>>","PeriodicalId":340070,"journal":{"name":"Proceedings Sixth International Parallel Processing Symposium","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115381561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-03-01DOI: 10.1109/IPPS.1992.223044
W. Deng, S. Iyengar
The paper discusses an optimal parallel algorithm for tree form generation of arithmetic expressions on an SIMD-SM EREW model. The main idea is how to avoid the read conflict posted by Bar-On and Vishkin's algorithm (1985) by modifying their parenthesis pairing algorithm.<>
{"title":"An optimal parallel algorithm for arithmetic expression parsing","authors":"W. Deng, S. Iyengar","doi":"10.1109/IPPS.1992.223044","DOIUrl":"https://doi.org/10.1109/IPPS.1992.223044","url":null,"abstract":"The paper discusses an optimal parallel algorithm for tree form generation of arithmetic expressions on an SIMD-SM EREW model. The main idea is how to avoid the read conflict posted by Bar-On and Vishkin's algorithm (1985) by modifying their parenthesis pairing algorithm.<<ETX>>","PeriodicalId":340070,"journal":{"name":"Proceedings Sixth International Parallel Processing Symposium","volume":"83 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114461267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-03-01DOI: 10.1109/IPPS.1992.223056
Y. Ben-Asher, A. Schuster
Reconfigurable networks have attracted increased attention recently, as an extremely strong parallel model which is realizable in hardware. The authors consider the basic problem of gathering information which is dispersed among the nodes of the network. They analyze the complexity of the problem on reconfigurable linear-arrays. The analysis introduces a novel criteria for the efficiency of reconfigurable network algorithms, namely the bus-usage. The bus-usage quantity measures the utilization of the network sub-buses by the algorithm. It is shown how this yields bounds on the algorithm run-time, by deriving a run-time to bus-usage trade-off.<>
{"title":"The bus-usage method for the analysis of reconfiguring networks algorithms","authors":"Y. Ben-Asher, A. Schuster","doi":"10.1109/IPPS.1992.223056","DOIUrl":"https://doi.org/10.1109/IPPS.1992.223056","url":null,"abstract":"Reconfigurable networks have attracted increased attention recently, as an extremely strong parallel model which is realizable in hardware. The authors consider the basic problem of gathering information which is dispersed among the nodes of the network. They analyze the complexity of the problem on reconfigurable linear-arrays. The analysis introduces a novel criteria for the efficiency of reconfigurable network algorithms, namely the bus-usage. The bus-usage quantity measures the utilization of the network sub-buses by the algorithm. It is shown how this yields bounds on the algorithm run-time, by deriving a run-time to bus-usage trade-off.<<ETX>>","PeriodicalId":340070,"journal":{"name":"Proceedings Sixth International Parallel Processing Symposium","volume":"106 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115769957","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-03-01DOI: 10.1109/IPPS.1992.223043
H. Bi, W. Giloi
Many elementary numerical algorithms involve not only vector operations but also matrix operations. Today's vector processors only support vector operations, and execute matrix operations in terms of vector operations, because they can not access matrix operands in one instruction. This will lead to poor sustained performances of vector machines. The paper discusses how to support both vector operations and matrix operations in vector architectures. At first subarray patterns for vector and matrix operations are introduced. Then it presents a set of accessing modes which can make vector architectures to access both vector and matrix operands. Finally the performance improvement for matrix multiplication and the FFT is demonstrated.<>
{"title":"Supporting matrix operations in vector architectures","authors":"H. Bi, W. Giloi","doi":"10.1109/IPPS.1992.223043","DOIUrl":"https://doi.org/10.1109/IPPS.1992.223043","url":null,"abstract":"Many elementary numerical algorithms involve not only vector operations but also matrix operations. Today's vector processors only support vector operations, and execute matrix operations in terms of vector operations, because they can not access matrix operands in one instruction. This will lead to poor sustained performances of vector machines. The paper discusses how to support both vector operations and matrix operations in vector architectures. At first subarray patterns for vector and matrix operations are introduced. Then it presents a set of accessing modes which can make vector architectures to access both vector and matrix operands. Finally the performance improvement for matrix multiplication and the FFT is demonstrated.<<ETX>>","PeriodicalId":340070,"journal":{"name":"Proceedings Sixth International Parallel Processing Symposium","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125023541","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The authors study routing problems in a general class of asymmetrical three-stage Clos networks. This class covers many asymmetrical three-stage networks considered by earlier researchers. They derive necessary and sufficient conditions under which this class of networks is rearrangeable with respect to a set of multiconnections, that is, connections where the paired entities are not limited to single terminals but can be arbitrary subsets of the terminals. They model the routing problem in these networks as a network-flow problem. If the number of switching elements in the first and last stages of the network is O(f) and the number of switching elements in the middle stage is m, then the network-flow model yields a routing algorithm with running time O(mf/sup 3/).<>
{"title":"Asymmetrical multiconnection three-stage Clos networks","authors":"A. Varma, S. Chalasani","doi":"10.1002/net.3230230423","DOIUrl":"https://doi.org/10.1002/net.3230230423","url":null,"abstract":"The authors study routing problems in a general class of asymmetrical three-stage Clos networks. This class covers many asymmetrical three-stage networks considered by earlier researchers. They derive necessary and sufficient conditions under which this class of networks is rearrangeable with respect to a set of multiconnections, that is, connections where the paired entities are not limited to single terminals but can be arbitrary subsets of the terminals. They model the routing problem in these networks as a network-flow problem. If the number of switching elements in the first and last stages of the network is O(f) and the number of switching elements in the middle stage is m, then the network-flow model yields a routing algorithm with running time O(mf/sup 3/).<<ETX>>","PeriodicalId":340070,"journal":{"name":"Proceedings Sixth International Parallel Processing Symposium","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125043569","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-03-01DOI: 10.1109/IPPS.1992.223025
J. Jenq, S. Sahni
The authors develop an O(n/sup 2/) time serial algorithm to obtain the medial axis transform (MAT) of an n*n image. An O(logn) time CREW PRAM algorithm and an O(log/sup 2/n) time SIMD hypercube parallel algorithm for the MAT are also developed. Both of these use O(n/sup 2/) processors. Two problems associated with the MAT are also studied. These are the area and perimeter reporting problem. The authors develop an O(logn) time hypercube algorithm for both of these problems. Here n is the number of squares in the MAT and the algorithms use O(n/sup 2/) processors.<>
{"title":"Serial and parallel algorithms for the medial axis transform","authors":"J. Jenq, S. Sahni","doi":"10.1109/IPPS.1992.223025","DOIUrl":"https://doi.org/10.1109/IPPS.1992.223025","url":null,"abstract":"The authors develop an O(n/sup 2/) time serial algorithm to obtain the medial axis transform (MAT) of an n*n image. An O(logn) time CREW PRAM algorithm and an O(log/sup 2/n) time SIMD hypercube parallel algorithm for the MAT are also developed. Both of these use O(n/sup 2/) processors. Two problems associated with the MAT are also studied. These are the area and perimeter reporting problem. The authors develop an O(logn) time hypercube algorithm for both of these problems. Here n is the number of squares in the MAT and the algorithms use O(n/sup 2/) processors.<<ETX>>","PeriodicalId":340070,"journal":{"name":"Proceedings Sixth International Parallel Processing Symposium","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123860947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-03-01DOI: 10.1109/IPPS.1992.223074
Y. Maa, D. Pradhan, D. Thiébaut
Cache coherence problem is a major design issue for shared-memory multiprocessors. As the system size scales, traditional bus-based snoopy cache coherence schemes are no longer adequate. Instead, the directory-based scheme is a promising approach to deal with the large-scale cache coherence problem. However, the storage overhead of directory schemes often becomes too prohibitive as the system size increases. The paper proposes the hierarchical full-map directory to reduce the storage requirement while still achieving satisfactory performance. The key point is to exploit the inherent geographical interprocessor locality among shared data in the parallel programs. Trace-driven evaluations show that the performance of the proposed scheme compares competitively to the full-map directory scheme, while reducing the storage overhead by over 90%. The proposed hierarchical full-map directory scheme seems to be a promising hardware approach for handling cache coherence in the design of future large-scale multiprocessor memory systems.<>
{"title":"A hierarchical directory scheme for large-scale cache-coherent multiprocessors","authors":"Y. Maa, D. Pradhan, D. Thiébaut","doi":"10.1109/IPPS.1992.223074","DOIUrl":"https://doi.org/10.1109/IPPS.1992.223074","url":null,"abstract":"Cache coherence problem is a major design issue for shared-memory multiprocessors. As the system size scales, traditional bus-based snoopy cache coherence schemes are no longer adequate. Instead, the directory-based scheme is a promising approach to deal with the large-scale cache coherence problem. However, the storage overhead of directory schemes often becomes too prohibitive as the system size increases. The paper proposes the hierarchical full-map directory to reduce the storage requirement while still achieving satisfactory performance. The key point is to exploit the inherent geographical interprocessor locality among shared data in the parallel programs. Trace-driven evaluations show that the performance of the proposed scheme compares competitively to the full-map directory scheme, while reducing the storage overhead by over 90%. The proposed hierarchical full-map directory scheme seems to be a promising hardware approach for handling cache coherence in the design of future large-scale multiprocessor memory systems.<<ETX>>","PeriodicalId":340070,"journal":{"name":"Proceedings Sixth International Parallel Processing Symposium","volume":"105 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124761687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-03-01DOI: 10.1109/IPPS.1992.223040
Susanne E. Hambrusch, F. Dehne
Let I be a n*n binary image stored in a n*n mesh of processors with one pixel per processor. Image I is k-width-connected if, informally, between any pair of pixels of value 'I' there exists a path of width k (composed of 1-pixels only). The authors consider the problem of determining the largest integer k such that I is k-width-connected, and present an optimal O(n) time algorithm for the mesh architecture.<>
{"title":"Determining maximum k-width-connectivity on meshes","authors":"Susanne E. Hambrusch, F. Dehne","doi":"10.1109/IPPS.1992.223040","DOIUrl":"https://doi.org/10.1109/IPPS.1992.223040","url":null,"abstract":"Let I be a n*n binary image stored in a n*n mesh of processors with one pixel per processor. Image I is k-width-connected if, informally, between any pair of pixels of value 'I' there exists a path of width k (composed of 1-pixels only). The authors consider the problem of determining the largest integer k such that I is k-width-connected, and present an optimal O(n) time algorithm for the mesh architecture.<<ETX>>","PeriodicalId":340070,"journal":{"name":"Proceedings Sixth International Parallel Processing Symposium","volume":"105 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122294717","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}