Pub Date : 1992-10-19DOI: 10.1109/FMPC.1992.234936
P. Arbenz, K. Gates, C. Sprenger
The authors propose a novel and simple way to parallelize the QR algorithm for computing eigenvalues and eigenvectors of real symmetric tridiagonal matrices. This approach is suitable for all parallel computers, ranging from multiprocessor supercomputers with shared memory to massively parallel computers with local memory. The authors report on numerical experiments completed on a Cray-Y-MP, an Alliant FX-80, a Sequent Symmetry S81b, a nCUBE 2, a Thinking Machines CM200, and a cluster of Sun SPARCstations. The numerical results indicate that the proposed algorithm is suitable for parallel execution on the whole range of parallel computers. While the results obtained on the computers with vector facilities did not show very high efficiencies, those obtained with multiprocessor computers with scalar CPUs had very good speedups.<>
{"title":"A parallel implementation of the symmetric tridiagonal QR algorithm","authors":"P. Arbenz, K. Gates, C. Sprenger","doi":"10.1109/FMPC.1992.234936","DOIUrl":"https://doi.org/10.1109/FMPC.1992.234936","url":null,"abstract":"The authors propose a novel and simple way to parallelize the QR algorithm for computing eigenvalues and eigenvectors of real symmetric tridiagonal matrices. This approach is suitable for all parallel computers, ranging from multiprocessor supercomputers with shared memory to massively parallel computers with local memory. The authors report on numerical experiments completed on a Cray-Y-MP, an Alliant FX-80, a Sequent Symmetry S81b, a nCUBE 2, a Thinking Machines CM200, and a cluster of Sun SPARCstations. The numerical results indicate that the proposed algorithm is suitable for parallel execution on the whole range of parallel computers. While the results obtained on the computers with vector facilities did not show very high efficiencies, those obtained with multiprocessor computers with scalar CPUs had very good speedups.<<ETX>>","PeriodicalId":117789,"journal":{"name":"[Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130435986","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-10-19DOI: 10.1109/FMPC.1992.234931
O. Kang, B.M. Kim, H. Yoon, S. Maeng, J. Cho
The authors propose a task migration scheme based on the HSA (heuristic subcube allocation) strategy to solve the fragmentation problem in a hypercube. This scheme, called CSC (complementary subcube coalescence), uses a heuristic and an undirected graph, called the SC (subcube) graph. If an incoming request is not satisfied due to the system fragmentation, the task migration scheme is performed to generate higher dimension subcubes. Simulation results show that the HSA strategy gives better efficiency than the Buddy and GC strategies in the adaptive mode. Moreover, the HSA strategy has a significantly lower migration cost than that of the Buddy and GC strategies.<>
{"title":"A graph-based subcube allocation and task migration in hypercube systems","authors":"O. Kang, B.M. Kim, H. Yoon, S. Maeng, J. Cho","doi":"10.1109/FMPC.1992.234931","DOIUrl":"https://doi.org/10.1109/FMPC.1992.234931","url":null,"abstract":"The authors propose a task migration scheme based on the HSA (heuristic subcube allocation) strategy to solve the fragmentation problem in a hypercube. This scheme, called CSC (complementary subcube coalescence), uses a heuristic and an undirected graph, called the SC (subcube) graph. If an incoming request is not satisfied due to the system fragmentation, the task migration scheme is performed to generate higher dimension subcubes. Simulation results show that the HSA strategy gives better efficiency than the Buddy and GC strategies in the adaptive mode. Moreover, the HSA strategy has a significantly lower migration cost than that of the Buddy and GC strategies.<<ETX>>","PeriodicalId":117789,"journal":{"name":"[Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation","volume":"2353 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127476663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-10-19DOI: 10.1109/FMPC.1992.234916
S. L. Scott, J. Baker
A constant time and space algorithm for embedding the hypercube architecture into the 3-dimension mesh (3D-mesh) is presented. This enables the cube/sub i/ operation to be performed on the embedded hypercube machine where the interprocessor communication function cube/sub i/ is defined on the embedded hypercube's PEs as cube/sub i/(b/sub n-1/...b/sub i/...b/sub 0/)=b/sub n-1/...b/sub i/...b/sub 0/ and b/sub i/ is the binary complement of b/sub i/.<>
{"title":"Embedding the hypercube into the 3-dimension mesh","authors":"S. L. Scott, J. Baker","doi":"10.1109/FMPC.1992.234916","DOIUrl":"https://doi.org/10.1109/FMPC.1992.234916","url":null,"abstract":"A constant time and space algorithm for embedding the hypercube architecture into the 3-dimension mesh (3D-mesh) is presented. This enables the cube/sub i/ operation to be performed on the embedded hypercube machine where the interprocessor communication function cube/sub i/ is defined on the embedded hypercube's PEs as cube/sub i/(b/sub n-1/...b/sub i/...b/sub 0/)=b/sub n-1/...b/sub i/...b/sub 0/ and b/sub i/ is the binary complement of b/sub i/.<<ETX>>","PeriodicalId":117789,"journal":{"name":"[Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130588144","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-10-19DOI: 10.1109/FMPC.1992.234958
R. Drefenstedt, D. Schmidt
The design of networks for massively parallel computers is strongly influenced by available technology. The network latency, critical for many applications, is significantly increased by packaging constraints, i.e. many connections between switches involving pad drivers or even line drivers. The authors concentrate on reducing those influences for a butterfly network related to Ranade's routing algorithm. Such a network is being implemented for a parallel RAM (PRAM) with 128 physical processors and 128 K logical processors. The required throughput makes it critical to use shared buses and improves the problem of space. While delays caused by switches can only be hidden by mapping many virtual processors to some physical ones, connection latency may be reduced by additional registers (shorter clock cycle time) and suitable mapping schemes (less space). Suitable clustering of processor modules and network parts may additionally decrease the network delay.<>
{"title":"On the physical design of butterfly networks for PRAMs","authors":"R. Drefenstedt, D. Schmidt","doi":"10.1109/FMPC.1992.234958","DOIUrl":"https://doi.org/10.1109/FMPC.1992.234958","url":null,"abstract":"The design of networks for massively parallel computers is strongly influenced by available technology. The network latency, critical for many applications, is significantly increased by packaging constraints, i.e. many connections between switches involving pad drivers or even line drivers. The authors concentrate on reducing those influences for a butterfly network related to Ranade's routing algorithm. Such a network is being implemented for a parallel RAM (PRAM) with 128 physical processors and 128 K logical processors. The required throughput makes it critical to use shared buses and improves the problem of space. While delays caused by switches can only be hidden by mapping many virtual processors to some physical ones, connection latency may be reduced by additional registers (shorter clock cycle time) and suitable mapping schemes (less space). Suitable clustering of processor modules and network parts may additionally decrease the network delay.<<ETX>>","PeriodicalId":117789,"journal":{"name":"[Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123484880","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1900-01-01DOI: 10.1109/FMPC.1992.234906
S. Shen, L. Kleinrock
The authors propose the virtual-time data-parallel machine to execute SIMD (single instruction multiple data) programs asynchronously. They first illustrate how asynchronous execution is more efficient than synchronous execution. For a simple model, they show that asynchronous execution outperforms synchronous execution roughly by a factor of (ln N), where N is the number of processors in the system. They then explore how to execute SIMD programs asynchronously without violating the SIMD semantics. They design a first in, first out (FIFO) priority cache, one for each processing element, to record the recent history of all variables. The cache, which is stacked between the processor and the memory, supports asynchronous execution in hardware efficiently and preserves the SIMD semantics of the software transparently. Analysis and simulation results indicate that the virtual-time data-parallel machine can achieve linear speed-up for computation-intensive data-parallel programs when the number of processors is large.<>
{"title":"The virtual-time data-parallel machine","authors":"S. Shen, L. Kleinrock","doi":"10.1109/FMPC.1992.234906","DOIUrl":"https://doi.org/10.1109/FMPC.1992.234906","url":null,"abstract":"The authors propose the virtual-time data-parallel machine to execute SIMD (single instruction multiple data) programs asynchronously. They first illustrate how asynchronous execution is more efficient than synchronous execution. For a simple model, they show that asynchronous execution outperforms synchronous execution roughly by a factor of (ln N), where N is the number of processors in the system. They then explore how to execute SIMD programs asynchronously without violating the SIMD semantics. They design a first in, first out (FIFO) priority cache, one for each processing element, to record the recent history of all variables. The cache, which is stacked between the processor and the memory, supports asynchronous execution in hardware efficiently and preserves the SIMD semantics of the software transparently. Analysis and simulation results indicate that the virtual-time data-parallel machine can achieve linear speed-up for computation-intensive data-parallel programs when the number of processors is large.<<ETX>>","PeriodicalId":117789,"journal":{"name":"[Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114752250","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}