Pub Date : 1992-03-01DOI: 10.1109/IPPS.1992.223058
C. Jeong, Jung-Ju Choi
The authors consider the problem of finding the smallest triangle circumscribing a convex polygon with n edges. They show that this can be done in O( square root n) time by efficient data partition schemes and proper set mapping and comparison operations using a so called square root n-decomposition technique. Since the nontrivial operation on MCC requires Omega ( square root n), the time complexity is optimal within a constant time factor.<>
{"title":"An optimal parallel algorithm for finding the smallest enclosing rectangle on a mesh-connected computer (for rectangle read triangle)","authors":"C. Jeong, Jung-Ju Choi","doi":"10.1109/IPPS.1992.223058","DOIUrl":"https://doi.org/10.1109/IPPS.1992.223058","url":null,"abstract":"The authors consider the problem of finding the smallest triangle circumscribing a convex polygon with n edges. They show that this can be done in O( square root n) time by efficient data partition schemes and proper set mapping and comparison operations using a so called square root n-decomposition technique. Since the nontrivial operation on MCC requires Omega ( square root n), the time complexity is optimal within a constant time factor.<<ETX>>","PeriodicalId":340070,"journal":{"name":"Proceedings Sixth International Parallel Processing Symposium","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114580258","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-03-01DOI: 10.1109/IPPS.1992.223023
M. Serrano, B. Parhami
A two-dimensional mesh of PEs with separable row and column buses has been shown to be quite effective for semigroup, prefix, and a wide class of other parallel computations. The authors show how semigroup and prefix computations can be performed with the same asymptotic time complexity on meshes having separable buses for a subset of rows and columns. They find that with this basic arrangement, square grids are not optimal but that a hierarchical method of synthesizing large meshes builds optimal square meshes from rectangular submeshes. The time-complexity results are shown to correspond to those previously published when certain parameters of the design are fixed at special values.<>
{"title":"Optimal aspect ratio and number of separable row/column buses for mesh-connected parallel computers","authors":"M. Serrano, B. Parhami","doi":"10.1109/IPPS.1992.223023","DOIUrl":"https://doi.org/10.1109/IPPS.1992.223023","url":null,"abstract":"A two-dimensional mesh of PEs with separable row and column buses has been shown to be quite effective for semigroup, prefix, and a wide class of other parallel computations. The authors show how semigroup and prefix computations can be performed with the same asymptotic time complexity on meshes having separable buses for a subset of rows and columns. They find that with this basic arrangement, square grids are not optimal but that a hierarchical method of synthesizing large meshes builds optimal square meshes from rectangular submeshes. The time-complexity results are shown to correspond to those previously published when certain parameters of the design are fixed at special values.<<ETX>>","PeriodicalId":340070,"journal":{"name":"Proceedings Sixth International Parallel Processing Symposium","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115971742","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-03-01DOI: 10.1109/IPPS.1992.222965
M. Farrens, A. Park, A. Woodruff
This paper presents a hybrid shared memory architecture which combines the scalability of a multistage interconnection network with the contention reduction benefits of coherent caches. The authors achieve this by replacing the memory modules and final stages of a multistage interconnection network with clusters of coherent caches. The performance of Cache Coherent Hybrid Interconnected Memory Extension (CCHIME) is evaluated by analyzing the results of extensive simulations of the network and coherent cache clusters. These results indicate that the CCHIME architecture can achieve lower memory access latencies and higher throughputs than typical multistage interconnection networks.<>
{"title":"CCHIME: a cache coherent hybrid interconnected memory extension","authors":"M. Farrens, A. Park, A. Woodruff","doi":"10.1109/IPPS.1992.222965","DOIUrl":"https://doi.org/10.1109/IPPS.1992.222965","url":null,"abstract":"This paper presents a hybrid shared memory architecture which combines the scalability of a multistage interconnection network with the contention reduction benefits of coherent caches. The authors achieve this by replacing the memory modules and final stages of a multistage interconnection network with clusters of coherent caches. The performance of Cache Coherent Hybrid Interconnected Memory Extension (CCHIME) is evaluated by analyzing the results of extensive simulations of the network and coherent cache clusters. These results indicate that the CCHIME architecture can achieve lower memory access latencies and higher throughputs than typical multistage interconnection networks.<<ETX>>","PeriodicalId":340070,"journal":{"name":"Proceedings Sixth International Parallel Processing Symposium","volume":"151 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125203155","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-03-01DOI: 10.1109/IPPS.1992.223073
Q. Malluhi, M. Bayoumi
Interconnection networks play a crucial role in the performance of parallel systems. The paper introduces the hierarchical hypercube (HHC) interconnection topology, which is suitable for parallel systems with thousands of processors. An appealing property of this network is the low number of connections per processor which enhances the VLSI design and fabrication of the system. Other alluring features include symmetry and logarithmic diameter which imply easy and fast algorithms for communication. A wide class of problems, the Divide & Conquer class (D&Q), is easily and efficiently solvable on the HHC topology. The solution of a D&Q problem instance having up to k inputs, requires a time complexity of O(log/sub 2/ k).<>
{"title":"Properties and performance of the hierarchical hypercube","authors":"Q. Malluhi, M. Bayoumi","doi":"10.1109/IPPS.1992.223073","DOIUrl":"https://doi.org/10.1109/IPPS.1992.223073","url":null,"abstract":"Interconnection networks play a crucial role in the performance of parallel systems. The paper introduces the hierarchical hypercube (HHC) interconnection topology, which is suitable for parallel systems with thousands of processors. An appealing property of this network is the low number of connections per processor which enhances the VLSI design and fabrication of the system. Other alluring features include symmetry and logarithmic diameter which imply easy and fast algorithms for communication. A wide class of problems, the Divide & Conquer class (D&Q), is easily and efficiently solvable on the HHC topology. The solution of a D&Q problem instance having up to k inputs, requires a time complexity of O(log/sub 2/ k).<<ETX>>","PeriodicalId":340070,"journal":{"name":"Proceedings Sixth International Parallel Processing Symposium","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127933059","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-03-01DOI: 10.1109/IPPS.1992.222971
J. Saarinen, Martti Lindroos, J. Tomberg, K. Kaski
A new efficient integrated circuit implementation of the Self-Organising Feature Map algorithm is described. The fully digital hardware is designed for high speed parallel processing and modular expandability. The hardware implementation acts as a neural coprocessor which uses synchronous, bit-serial arithmetic. It includes functional units which perform the Euclidean distance computation, the minimum distance search, the memory controlling, and the updating function. The on-chip learning facilitates fully autonomous operation.<>
{"title":"Parallel coprocessor for Kohonen's self-organizing neural network","authors":"J. Saarinen, Martti Lindroos, J. Tomberg, K. Kaski","doi":"10.1109/IPPS.1992.222971","DOIUrl":"https://doi.org/10.1109/IPPS.1992.222971","url":null,"abstract":"A new efficient integrated circuit implementation of the Self-Organising Feature Map algorithm is described. The fully digital hardware is designed for high speed parallel processing and modular expandability. The hardware implementation acts as a neural coprocessor which uses synchronous, bit-serial arithmetic. It includes functional units which perform the Euclidean distance computation, the minimum distance search, the memory controlling, and the updating function. The on-chip learning facilitates fully autonomous operation.<<ETX>>","PeriodicalId":340070,"journal":{"name":"Proceedings Sixth International Parallel Processing Symposium","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125797608","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-03-01DOI: 10.1109/IPPS.1992.223080
P. Yang, C. Raghavendra
Considers the problem of embedding and reconfiguring binary tree structures in faulty hypercubes. The authors assume that the number of faulty nodes is about n, where n is the number of dimensions of the hypercube; they further assume that the location of faulty nodes are known. The embedding techniques are based on a key concept called free dimension, which can be used to partition a cube into subcubes such that each subcube contains at most one faulty node. Using this approach, two distributed schemes are provided for embedding and reconfiguration of binary trees in faulty hypercubes.<>
{"title":"Embedding and reconfiguration of binary trees in faulty hypercubes","authors":"P. Yang, C. Raghavendra","doi":"10.1109/IPPS.1992.223080","DOIUrl":"https://doi.org/10.1109/IPPS.1992.223080","url":null,"abstract":"Considers the problem of embedding and reconfiguring binary tree structures in faulty hypercubes. The authors assume that the number of faulty nodes is about n, where n is the number of dimensions of the hypercube; they further assume that the location of faulty nodes are known. The embedding techniques are based on a key concept called free dimension, which can be used to partition a cube into subcubes such that each subcube contains at most one faulty node. Using this approach, two distributed schemes are provided for embedding and reconfiguration of binary trees in faulty hypercubes.<<ETX>>","PeriodicalId":340070,"journal":{"name":"Proceedings Sixth International Parallel Processing Symposium","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127292321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-03-01DOI: 10.1109/IPPS.1992.222967
K. Bogineni, P. Dowd
An Optically Interconnected Distributed Shared Memory (OIDSM) system is introduced and analyzed. Distributed shared memory systems place a heavy traffic requirement on the interconnection network. Complex memory allocation schemes have been introduced to reduce the network load. The photonic network of the system introduced in this paper alleviates the traffic load concern, and enables the development of a fixed memory allocation scheme with a significant reduction in complexity. The photonic network employs wavelength division multiple access (WDMA), creating multiple channels on a single optical fiber. This paper analyzes the performance of two memory allocation schemes through mean value analysis of a closed queueing network. The performance model is validated through simulation.<>
{"title":"Performance analysis of two address space allocation schemes for an optically interconnected distributed shared memory system","authors":"K. Bogineni, P. Dowd","doi":"10.1109/IPPS.1992.222967","DOIUrl":"https://doi.org/10.1109/IPPS.1992.222967","url":null,"abstract":"An Optically Interconnected Distributed Shared Memory (OIDSM) system is introduced and analyzed. Distributed shared memory systems place a heavy traffic requirement on the interconnection network. Complex memory allocation schemes have been introduced to reduce the network load. The photonic network of the system introduced in this paper alleviates the traffic load concern, and enables the development of a fixed memory allocation scheme with a significant reduction in complexity. The photonic network employs wavelength division multiple access (WDMA), creating multiple channels on a single optical fiber. This paper analyzes the performance of two memory allocation schemes through mean value analysis of a closed queueing network. The performance model is validated through simulation.<<ETX>>","PeriodicalId":340070,"journal":{"name":"Proceedings Sixth International Parallel Processing Symposium","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125436889","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-03-01DOI: 10.1109/IPPS.1992.222989
Wilson C. Hsieh, W. Weihl
Current algorithms for reader-writer synchronization do not scale for readers: readers cannot acquire locks in parallel. The authors describe two new algorithms that allow parallelism among readers during lock acquisition; this is achieved by distributing the lock state among different processors, and by trading reader throughput for writer throughput. Their experiments show that when reads are a large percentage of lock requests, the throughput of each of their algorithms scales significantly better than current algorithms.<>
{"title":"Scalable reader-writer locks for parallel systems","authors":"Wilson C. Hsieh, W. Weihl","doi":"10.1109/IPPS.1992.222989","DOIUrl":"https://doi.org/10.1109/IPPS.1992.222989","url":null,"abstract":"Current algorithms for reader-writer synchronization do not scale for readers: readers cannot acquire locks in parallel. The authors describe two new algorithms that allow parallelism among readers during lock acquisition; this is achieved by distributing the lock state among different processors, and by trading reader throughput for writer throughput. Their experiments show that when reads are a large percentage of lock requests, the throughput of each of their algorithms scales significantly better than current algorithms.<<ETX>>","PeriodicalId":340070,"journal":{"name":"Proceedings Sixth International Parallel Processing Symposium","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130270582","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-03-01DOI: 10.1109/IPPS.1992.222980
M. K. Kumar, P. S. Kumar, A. Basu
The authors propose the design of a library environment, called PARUL (PARallel User Library), for distributed memory multiprocessor systems. An important feature of the environment is that it allows the data distributed for use of a library function as well as the results generated by the function to be retained in the network of processors to be used by subsequent library functions. The user of the library is given full control over the set of variables that are retained in the network. The authors describe the implementation details of PARUL on a multi-transputer system (PARAM) and discuss its performance.<>
{"title":"A library environment for distributed memory multiprocessors","authors":"M. K. Kumar, P. S. Kumar, A. Basu","doi":"10.1109/IPPS.1992.222980","DOIUrl":"https://doi.org/10.1109/IPPS.1992.222980","url":null,"abstract":"The authors propose the design of a library environment, called PARUL (PARallel User Library), for distributed memory multiprocessor systems. An important feature of the environment is that it allows the data distributed for use of a library function as well as the results generated by the function to be retained in the network of processors to be used by subsequent library functions. The user of the library is given full control over the set of variables that are retained in the network. The authors describe the implementation details of PARUL on a multi-transputer system (PARAM) and discuss its performance.<<ETX>>","PeriodicalId":340070,"journal":{"name":"Proceedings Sixth International Parallel Processing Symposium","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132297661","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-03-01DOI: 10.1109/IPPS.1992.223041
M. Meybodi
New designs for performing a group of priority queue operations on a set of elements are presented. Processors in this design, called the banyan heap machine are connected together to form a linear chain. The algorithms for the banyan heap machine are the generalization of binary heap algorithms to a more general acyclic graph called banyan. This design, unlike existing designs, requires fewer processors to meet the same capacity requirement, and also, processors do not have geometrically varying memory sizes. This results in a completely homogeneous system. The key advantage of the banyan heap machine is in its ability to retrieve elements at different percentile levels.<>
{"title":"Banyan heap machine","authors":"M. Meybodi","doi":"10.1109/IPPS.1992.223041","DOIUrl":"https://doi.org/10.1109/IPPS.1992.223041","url":null,"abstract":"New designs for performing a group of priority queue operations on a set of elements are presented. Processors in this design, called the banyan heap machine are connected together to form a linear chain. The algorithms for the banyan heap machine are the generalization of binary heap algorithms to a more general acyclic graph called banyan. This design, unlike existing designs, requires fewer processors to meet the same capacity requirement, and also, processors do not have geometrically varying memory sizes. This results in a completely homogeneous system. The key advantage of the banyan heap machine is in its ability to retrieve elements at different percentile levels.<<ETX>>","PeriodicalId":340070,"journal":{"name":"Proceedings Sixth International Parallel Processing Symposium","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130180275","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}