Pub Date : 1992-03-01DOI: 10.1109/IPPS.1992.223007
Jianping Zhu
Discusses a householder factorization algorithm for a special type of matrix arising from the application of the Tikhnov regularization method to an ill-conditioned least square problem. The matrix involved is half dense and half sparse. The algorithm has been implemented on iPSC/860 hypercubes. By overlapping communications with computations, the code has been optimized to take advantage of the special structure of the matrix and minimize inter-node communications. Super-linear speed-up was observed in the numerical experiment for large problems. The algorithm has been used as a core routine in the program solving parameter identification problems in reservoir simulations.<>
{"title":"Householder transformation for the regularized least square problem on iPSC/860","authors":"Jianping Zhu","doi":"10.1109/IPPS.1992.223007","DOIUrl":"https://doi.org/10.1109/IPPS.1992.223007","url":null,"abstract":"Discusses a householder factorization algorithm for a special type of matrix arising from the application of the Tikhnov regularization method to an ill-conditioned least square problem. The matrix involved is half dense and half sparse. The algorithm has been implemented on iPSC/860 hypercubes. By overlapping communications with computations, the code has been optimized to take advantage of the special structure of the matrix and minimize inter-node communications. Super-linear speed-up was observed in the numerical experiment for large problems. The algorithm has been used as a core routine in the program solving parameter identification problems in reservoir simulations.<<ETX>>","PeriodicalId":340070,"journal":{"name":"Proceedings Sixth International Parallel Processing Symposium","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128298703","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-03-01DOI: 10.1109/IPPS.1992.222965
M. Farrens, A. Park, A. Woodruff
This paper presents a hybrid shared memory architecture which combines the scalability of a multistage interconnection network with the contention reduction benefits of coherent caches. The authors achieve this by replacing the memory modules and final stages of a multistage interconnection network with clusters of coherent caches. The performance of Cache Coherent Hybrid Interconnected Memory Extension (CCHIME) is evaluated by analyzing the results of extensive simulations of the network and coherent cache clusters. These results indicate that the CCHIME architecture can achieve lower memory access latencies and higher throughputs than typical multistage interconnection networks.<>
{"title":"CCHIME: a cache coherent hybrid interconnected memory extension","authors":"M. Farrens, A. Park, A. Woodruff","doi":"10.1109/IPPS.1992.222965","DOIUrl":"https://doi.org/10.1109/IPPS.1992.222965","url":null,"abstract":"This paper presents a hybrid shared memory architecture which combines the scalability of a multistage interconnection network with the contention reduction benefits of coherent caches. The authors achieve this by replacing the memory modules and final stages of a multistage interconnection network with clusters of coherent caches. The performance of Cache Coherent Hybrid Interconnected Memory Extension (CCHIME) is evaluated by analyzing the results of extensive simulations of the network and coherent cache clusters. These results indicate that the CCHIME architecture can achieve lower memory access latencies and higher throughputs than typical multistage interconnection networks.<<ETX>>","PeriodicalId":340070,"journal":{"name":"Proceedings Sixth International Parallel Processing Symposium","volume":"151 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125203155","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-03-01DOI: 10.1109/IPPS.1992.223080
P. Yang, C. Raghavendra
Considers the problem of embedding and reconfiguring binary tree structures in faulty hypercubes. The authors assume that the number of faulty nodes is about n, where n is the number of dimensions of the hypercube; they further assume that the location of faulty nodes are known. The embedding techniques are based on a key concept called free dimension, which can be used to partition a cube into subcubes such that each subcube contains at most one faulty node. Using this approach, two distributed schemes are provided for embedding and reconfiguration of binary trees in faulty hypercubes.<>
{"title":"Embedding and reconfiguration of binary trees in faulty hypercubes","authors":"P. Yang, C. Raghavendra","doi":"10.1109/IPPS.1992.223080","DOIUrl":"https://doi.org/10.1109/IPPS.1992.223080","url":null,"abstract":"Considers the problem of embedding and reconfiguring binary tree structures in faulty hypercubes. The authors assume that the number of faulty nodes is about n, where n is the number of dimensions of the hypercube; they further assume that the location of faulty nodes are known. The embedding techniques are based on a key concept called free dimension, which can be used to partition a cube into subcubes such that each subcube contains at most one faulty node. Using this approach, two distributed schemes are provided for embedding and reconfiguration of binary trees in faulty hypercubes.<<ETX>>","PeriodicalId":340070,"journal":{"name":"Proceedings Sixth International Parallel Processing Symposium","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127292321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-03-01DOI: 10.1109/IPPS.1992.222967
K. Bogineni, P. Dowd
An Optically Interconnected Distributed Shared Memory (OIDSM) system is introduced and analyzed. Distributed shared memory systems place a heavy traffic requirement on the interconnection network. Complex memory allocation schemes have been introduced to reduce the network load. The photonic network of the system introduced in this paper alleviates the traffic load concern, and enables the development of a fixed memory allocation scheme with a significant reduction in complexity. The photonic network employs wavelength division multiple access (WDMA), creating multiple channels on a single optical fiber. This paper analyzes the performance of two memory allocation schemes through mean value analysis of a closed queueing network. The performance model is validated through simulation.<>
{"title":"Performance analysis of two address space allocation schemes for an optically interconnected distributed shared memory system","authors":"K. Bogineni, P. Dowd","doi":"10.1109/IPPS.1992.222967","DOIUrl":"https://doi.org/10.1109/IPPS.1992.222967","url":null,"abstract":"An Optically Interconnected Distributed Shared Memory (OIDSM) system is introduced and analyzed. Distributed shared memory systems place a heavy traffic requirement on the interconnection network. Complex memory allocation schemes have been introduced to reduce the network load. The photonic network of the system introduced in this paper alleviates the traffic load concern, and enables the development of a fixed memory allocation scheme with a significant reduction in complexity. The photonic network employs wavelength division multiple access (WDMA), creating multiple channels on a single optical fiber. This paper analyzes the performance of two memory allocation schemes through mean value analysis of a closed queueing network. The performance model is validated through simulation.<<ETX>>","PeriodicalId":340070,"journal":{"name":"Proceedings Sixth International Parallel Processing Symposium","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125436889","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-03-01DOI: 10.1109/IPPS.1992.222971
J. Saarinen, Martti Lindroos, J. Tomberg, K. Kaski
A new efficient integrated circuit implementation of the Self-Organising Feature Map algorithm is described. The fully digital hardware is designed for high speed parallel processing and modular expandability. The hardware implementation acts as a neural coprocessor which uses synchronous, bit-serial arithmetic. It includes functional units which perform the Euclidean distance computation, the minimum distance search, the memory controlling, and the updating function. The on-chip learning facilitates fully autonomous operation.<>
{"title":"Parallel coprocessor for Kohonen's self-organizing neural network","authors":"J. Saarinen, Martti Lindroos, J. Tomberg, K. Kaski","doi":"10.1109/IPPS.1992.222971","DOIUrl":"https://doi.org/10.1109/IPPS.1992.222971","url":null,"abstract":"A new efficient integrated circuit implementation of the Self-Organising Feature Map algorithm is described. The fully digital hardware is designed for high speed parallel processing and modular expandability. The hardware implementation acts as a neural coprocessor which uses synchronous, bit-serial arithmetic. It includes functional units which perform the Euclidean distance computation, the minimum distance search, the memory controlling, and the updating function. The on-chip learning facilitates fully autonomous operation.<<ETX>>","PeriodicalId":340070,"journal":{"name":"Proceedings Sixth International Parallel Processing Symposium","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125797608","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-03-01DOI: 10.1109/IPPS.1992.223058
C. Jeong, Jung-Ju Choi
The authors consider the problem of finding the smallest triangle circumscribing a convex polygon with n edges. They show that this can be done in O( square root n) time by efficient data partition schemes and proper set mapping and comparison operations using a so called square root n-decomposition technique. Since the nontrivial operation on MCC requires Omega ( square root n), the time complexity is optimal within a constant time factor.<>
{"title":"An optimal parallel algorithm for finding the smallest enclosing rectangle on a mesh-connected computer (for rectangle read triangle)","authors":"C. Jeong, Jung-Ju Choi","doi":"10.1109/IPPS.1992.223058","DOIUrl":"https://doi.org/10.1109/IPPS.1992.223058","url":null,"abstract":"The authors consider the problem of finding the smallest triangle circumscribing a convex polygon with n edges. They show that this can be done in O( square root n) time by efficient data partition schemes and proper set mapping and comparison operations using a so called square root n-decomposition technique. Since the nontrivial operation on MCC requires Omega ( square root n), the time complexity is optimal within a constant time factor.<<ETX>>","PeriodicalId":340070,"journal":{"name":"Proceedings Sixth International Parallel Processing Symposium","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114580258","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-03-01DOI: 10.1109/IPPS.1992.223021
M. Kilian
For serial computation, the object-oriented methodology (O-O) has been shown to aid program modeling increase reusability, and result in more robust programs. Because of its object-centricity, O-O seems well suited to data parallel models of massively parallel programming. Many of the benefits of O-O stem from the arbitrary combining of objects, and the resulting arbitrary message passing patterns. Unfortunately, when working with tens of thousands of processors in parallel, this arbitrariness can result in communication conflicts. The paper proposes a model of objects and communication that resolves this problem.<>
{"title":"A model of objects and communication for massively parallel programming","authors":"M. Kilian","doi":"10.1109/IPPS.1992.223021","DOIUrl":"https://doi.org/10.1109/IPPS.1992.223021","url":null,"abstract":"For serial computation, the object-oriented methodology (O-O) has been shown to aid program modeling increase reusability, and result in more robust programs. Because of its object-centricity, O-O seems well suited to data parallel models of massively parallel programming. Many of the benefits of O-O stem from the arbitrary combining of objects, and the resulting arbitrary message passing patterns. Unfortunately, when working with tens of thousands of processors in parallel, this arbitrariness can result in communication conflicts. The paper proposes a model of objects and communication that resolves this problem.<<ETX>>","PeriodicalId":340070,"journal":{"name":"Proceedings Sixth International Parallel Processing Symposium","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124497191","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-03-01DOI: 10.1109/IPPS.1992.223073
Q. Malluhi, M. Bayoumi
Interconnection networks play a crucial role in the performance of parallel systems. The paper introduces the hierarchical hypercube (HHC) interconnection topology, which is suitable for parallel systems with thousands of processors. An appealing property of this network is the low number of connections per processor which enhances the VLSI design and fabrication of the system. Other alluring features include symmetry and logarithmic diameter which imply easy and fast algorithms for communication. A wide class of problems, the Divide & Conquer class (D&Q), is easily and efficiently solvable on the HHC topology. The solution of a D&Q problem instance having up to k inputs, requires a time complexity of O(log/sub 2/ k).<>
{"title":"Properties and performance of the hierarchical hypercube","authors":"Q. Malluhi, M. Bayoumi","doi":"10.1109/IPPS.1992.223073","DOIUrl":"https://doi.org/10.1109/IPPS.1992.223073","url":null,"abstract":"Interconnection networks play a crucial role in the performance of parallel systems. The paper introduces the hierarchical hypercube (HHC) interconnection topology, which is suitable for parallel systems with thousands of processors. An appealing property of this network is the low number of connections per processor which enhances the VLSI design and fabrication of the system. Other alluring features include symmetry and logarithmic diameter which imply easy and fast algorithms for communication. A wide class of problems, the Divide & Conquer class (D&Q), is easily and efficiently solvable on the HHC topology. The solution of a D&Q problem instance having up to k inputs, requires a time complexity of O(log/sub 2/ k).<<ETX>>","PeriodicalId":340070,"journal":{"name":"Proceedings Sixth International Parallel Processing Symposium","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127933059","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-03-01DOI: 10.1109/IPPS.1992.222980
M. K. Kumar, P. S. Kumar, A. Basu
The authors propose the design of a library environment, called PARUL (PARallel User Library), for distributed memory multiprocessor systems. An important feature of the environment is that it allows the data distributed for use of a library function as well as the results generated by the function to be retained in the network of processors to be used by subsequent library functions. The user of the library is given full control over the set of variables that are retained in the network. The authors describe the implementation details of PARUL on a multi-transputer system (PARAM) and discuss its performance.<>
{"title":"A library environment for distributed memory multiprocessors","authors":"M. K. Kumar, P. S. Kumar, A. Basu","doi":"10.1109/IPPS.1992.222980","DOIUrl":"https://doi.org/10.1109/IPPS.1992.222980","url":null,"abstract":"The authors propose the design of a library environment, called PARUL (PARallel User Library), for distributed memory multiprocessor systems. An important feature of the environment is that it allows the data distributed for use of a library function as well as the results generated by the function to be retained in the network of processors to be used by subsequent library functions. The user of the library is given full control over the set of variables that are retained in the network. The authors describe the implementation details of PARUL on a multi-transputer system (PARAM) and discuss its performance.<<ETX>>","PeriodicalId":340070,"journal":{"name":"Proceedings Sixth International Parallel Processing Symposium","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132297661","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-03-01DOI: 10.1109/IPPS.1992.223041
M. Meybodi
New designs for performing a group of priority queue operations on a set of elements are presented. Processors in this design, called the banyan heap machine are connected together to form a linear chain. The algorithms for the banyan heap machine are the generalization of binary heap algorithms to a more general acyclic graph called banyan. This design, unlike existing designs, requires fewer processors to meet the same capacity requirement, and also, processors do not have geometrically varying memory sizes. This results in a completely homogeneous system. The key advantage of the banyan heap machine is in its ability to retrieve elements at different percentile levels.<>
{"title":"Banyan heap machine","authors":"M. Meybodi","doi":"10.1109/IPPS.1992.223041","DOIUrl":"https://doi.org/10.1109/IPPS.1992.223041","url":null,"abstract":"New designs for performing a group of priority queue operations on a set of elements are presented. Processors in this design, called the banyan heap machine are connected together to form a linear chain. The algorithms for the banyan heap machine are the generalization of binary heap algorithms to a more general acyclic graph called banyan. This design, unlike existing designs, requires fewer processors to meet the same capacity requirement, and also, processors do not have geometrically varying memory sizes. This results in a completely homogeneous system. The key advantage of the banyan heap machine is in its ability to retrieve elements at different percentile levels.<<ETX>>","PeriodicalId":340070,"journal":{"name":"Proceedings Sixth International Parallel Processing Symposium","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130180275","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}