Pub Date : 1992-10-19DOI: 10.1109/FMPC.1992.234942
I. Yen, F. Bastani, T. Al-Marzooq
An approach for incorporating information hiding within parallel software components is developed. The loss of performance is overcome by having intracomponent encapsulation layers, massive state transition operations, multiple-entry data structures, and program transformation. The approach was experimentally evaluated for three types of objects and application programs on a Connection Machine (CM-2). The results indicate that the approach can reduce the loss of performance due to information hiding. The results indicate that there is some loss of performance for the sorted-array implementation of the set object. Also, the performance of the hash data structure was much worse than expected. Hardware message queues would greatly improve the performance.<>
{"title":"Information hiding in parallel programs: model and experimental evaluation on the Connection Machine","authors":"I. Yen, F. Bastani, T. Al-Marzooq","doi":"10.1109/FMPC.1992.234942","DOIUrl":"https://doi.org/10.1109/FMPC.1992.234942","url":null,"abstract":"An approach for incorporating information hiding within parallel software components is developed. The loss of performance is overcome by having intracomponent encapsulation layers, massive state transition operations, multiple-entry data structures, and program transformation. The approach was experimentally evaluated for three types of objects and application programs on a Connection Machine (CM-2). The results indicate that the approach can reduce the loss of performance due to information hiding. The results indicate that there is some loss of performance for the sorted-array implementation of the set object. Also, the performance of the hash data structure was much worse than expected. Hardware message queues would greatly improve the performance.<<ETX>>","PeriodicalId":117789,"journal":{"name":"[Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134208378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-10-19DOI: 10.1109/FMPC.1992.234891
C.C.S. Lin, V. K. Prasanna
A routing algorithm is shown which can route in O( square root log N*2/sup square root 2logN/) steps in an N node packed exponential connections (PEC) network. It is also shown that semigroup operations can be performed in O(log N*2/sup square root 2logN/) parallel steps.<>
{"title":"A routing algorithm for PEC networks","authors":"C.C.S. Lin, V. K. Prasanna","doi":"10.1109/FMPC.1992.234891","DOIUrl":"https://doi.org/10.1109/FMPC.1992.234891","url":null,"abstract":"A routing algorithm is shown which can route in O( square root log N*2/sup square root 2logN/) steps in an N node packed exponential connections (PEC) network. It is also shown that semigroup operations can be performed in O(log N*2/sup square root 2logN/) parallel steps.<<ETX>>","PeriodicalId":117789,"journal":{"name":"[Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation","volume":"147 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122687205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-10-19DOI: 10.1109/FMPC.1992.234924
P. Narayanan
Algorithms for computing the shortest paths to every vertex from a single source vertex in nonnegatively weighted graphs are examined. A conventional data parallel algorithm and a replicated data algorithm for the single-source shortest path problem are presented. Both algorithms have been implemented on a Connection Machine CM-2 and a MasPar MP-1. Analytical and experimental speedups using the data replication technique are presented.<>
{"title":"Single source shortest path problem on processor arrays","authors":"P. Narayanan","doi":"10.1109/FMPC.1992.234924","DOIUrl":"https://doi.org/10.1109/FMPC.1992.234924","url":null,"abstract":"Algorithms for computing the shortest paths to every vertex from a single source vertex in nonnegatively weighted graphs are examined. A conventional data parallel algorithm and a replicated data algorithm for the single-source shortest path problem are presented. Both algorithms have been implemented on a Connection Machine CM-2 and a MasPar MP-1. Analytical and experimental speedups using the data replication technique are presented.<<ETX>>","PeriodicalId":117789,"journal":{"name":"[Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation","volume":"120 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124091117","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-10-19DOI: 10.1109/FMPC.1992.234895
M. Herbordt, C. Weems
The efficient computation of region parameters in image understanding by a SIMD (single-instruction multiple-data) array requires that those regions be processed simultaneously. The difficulty is in orchestrating nonuniform data-dependent communication using only a single thread of control. The authors have found that, on reconfigurable broadcast meshes, coterie structures can be used to overcome this problem. They present a deterministic algorithm to compute parallel prefix in O(log N) communication steps for a number of real images and sketch a randomized reduction algorithm based on graph contraction that has O(log N) complexity for all images.<>
{"title":"Computing parallel prefix and reduction using coterie structures","authors":"M. Herbordt, C. Weems","doi":"10.1109/FMPC.1992.234895","DOIUrl":"https://doi.org/10.1109/FMPC.1992.234895","url":null,"abstract":"The efficient computation of region parameters in image understanding by a SIMD (single-instruction multiple-data) array requires that those regions be processed simultaneously. The difficulty is in orchestrating nonuniform data-dependent communication using only a single thread of control. The authors have found that, on reconfigurable broadcast meshes, coterie structures can be used to overcome this problem. They present a deterministic algorithm to compute parallel prefix in O(log N) communication steps for a number of real images and sketch a randomized reduction algorithm based on graph contraction that has O(log N) complexity for all images.<<ETX>>","PeriodicalId":117789,"journal":{"name":"[Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127690140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-10-19DOI: 10.1109/FMPC.1992.234892
K. Liszka, K. Batcher
The odd-even merge is a widely used and generally accepted merging network that uses O(N log/sup 2/N) comparators with O(log/sup 2/N) delay. A novel merging network is presented that generalizes the technique used in the odd-even merge. It is based on the division of the input keys by a specified modulus, not limited to 2. A special comparator is used in the final merge step that accepts m input lines and produces m sorted items, where m is the modulus selected for the merge. Alternatives are discussed that apply to the bitonic merging network.<>
{"title":"A modulo merge sorting network","authors":"K. Liszka, K. Batcher","doi":"10.1109/FMPC.1992.234892","DOIUrl":"https://doi.org/10.1109/FMPC.1992.234892","url":null,"abstract":"The odd-even merge is a widely used and generally accepted merging network that uses O(N log/sup 2/N) comparators with O(log/sup 2/N) delay. A novel merging network is presented that generalizes the technique used in the odd-even merge. It is based on the division of the input keys by a specified modulus, not limited to 2. A special comparator is used in the final merge step that accepts m input lines and produces m sorted items, where m is the modulus selected for the merge. Alternatives are discussed that apply to the bitonic merging network.<<ETX>>","PeriodicalId":117789,"journal":{"name":"[Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation","volume":"96 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115364331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-10-19DOI: 10.1109/FMPC.1992.234917
D. Schimmel
Presents a parallel computer architecture which synthesizes the notions of instruction level parallelism and data parallelism. Extending the work of Siegel and others on reconfigurable SIMD/MIMD architecture, it attains most of the advantages of those machines, via selective execution of a superscalar instruction stream, while retaining most of the cost advantage of the SIMD architectural style. Furthermore, it preserves the single instruction stream framework which makes SIMD machines simpler to program. Finally, it admits the use of compiler techniques to schedule the superscalar instruction stream, allowing the automatic utilization of the latent instruction level parallelism.<>
{"title":"Superscalar SIMD architecture","authors":"D. Schimmel","doi":"10.1109/FMPC.1992.234917","DOIUrl":"https://doi.org/10.1109/FMPC.1992.234917","url":null,"abstract":"Presents a parallel computer architecture which synthesizes the notions of instruction level parallelism and data parallelism. Extending the work of Siegel and others on reconfigurable SIMD/MIMD architecture, it attains most of the advantages of those machines, via selective execution of a superscalar instruction stream, while retaining most of the cost advantage of the SIMD architectural style. Furthermore, it preserves the single instruction stream framework which makes SIMD machines simpler to program. Finally, it admits the use of compiler techniques to schedule the superscalar instruction stream, allowing the automatic utilization of the latent instruction level parallelism.<<ETX>>","PeriodicalId":117789,"journal":{"name":"[Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126710423","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The authors have implemented seven data-parallel primitives on the hybrid dataflow/von Neumann parallel computer EM-4. To evaluate the performance of these primitives, the authors compare them to identical primitives running on a CM-200 SIMD (single-instruction multiple-data) parallel computer. For integer arithmetic element-wise operations, EM-4 is faster than the CM-200 when two or more operations can be combined. For communications operations, EM-4 has significantly higher performance. EM-4's distinguishing feature in running data-parallel codes is its exceptional communications performance in terms of network bandwidth and latency, and processor/network interface. Additional special-purpose hardware for barrier synchronization and scan-like operations is not necessary. Dataflow-style token synchronization is helpful, but not necessary in implementing data-parallel primitives.<>
{"title":"Performance of data-parallel primitives on the EM-4 dataflow parallel supercomputer","authors":"SupercomputerAndrew Shaw, Yuetsu Kodamaz, Mitsuhisa Satoz, Shuichi Sakaiz, Yoshinori YamaguchizyMIT","doi":"10.1109/FMPC.1992.234945","DOIUrl":"https://doi.org/10.1109/FMPC.1992.234945","url":null,"abstract":"The authors have implemented seven data-parallel primitives on the hybrid dataflow/von Neumann parallel computer EM-4. To evaluate the performance of these primitives, the authors compare them to identical primitives running on a CM-200 SIMD (single-instruction multiple-data) parallel computer. For integer arithmetic element-wise operations, EM-4 is faster than the CM-200 when two or more operations can be combined. For communications operations, EM-4 has significantly higher performance. EM-4's distinguishing feature in running data-parallel codes is its exceptional communications performance in terms of network bandwidth and latency, and processor/network interface. Additional special-purpose hardware for barrier synchronization and scan-like operations is not necessary. Dataflow-style token synchronization is helpful, but not necessary in implementing data-parallel primitives.<<ETX>>","PeriodicalId":117789,"journal":{"name":"[Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129490774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-10-19DOI: 10.1109/FMPC.1992.234934
C. D. Scarbnick, M. Chang, M. Schultz, A. B. Sherman
A problem arising in scientific computation is the solution of Ax=b, where A is a large, sparse matrix. One of the most robust algorithms for solving the above equation is the conjugate gradient method, especially when combined with a preconditioner. The authors discuss a new software package, MP-PCGPAK2, that implements a parallel version of the conjugate gradient method for MIMD (multiple-instruction multiple-data), message passing architectures. The parallel implementation is quite general and can be applied to algorithms for nonsymmetric or indefinite systems such as GMRES, Bi-CGSTAB, and QMR. The authors present results on a 1024 processor nCUBE 2, and a 128 processor iPSC/860, for positive definite, symmetric systems ranging from one million to over 11 million variables.<>
{"title":"A parallel software package for solving linear systems","authors":"C. D. Scarbnick, M. Chang, M. Schultz, A. B. Sherman","doi":"10.1109/FMPC.1992.234934","DOIUrl":"https://doi.org/10.1109/FMPC.1992.234934","url":null,"abstract":"A problem arising in scientific computation is the solution of Ax=b, where A is a large, sparse matrix. One of the most robust algorithms for solving the above equation is the conjugate gradient method, especially when combined with a preconditioner. The authors discuss a new software package, MP-PCGPAK2, that implements a parallel version of the conjugate gradient method for MIMD (multiple-instruction multiple-data), message passing architectures. The parallel implementation is quite general and can be applied to algorithms for nonsymmetric or indefinite systems such as GMRES, Bi-CGSTAB, and QMR. The authors present results on a 1024 processor nCUBE 2, and a 128 processor iPSC/860, for positive definite, symmetric systems ranging from one million to over 11 million variables.<<ETX>>","PeriodicalId":117789,"journal":{"name":"[Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130301565","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-10-19DOI: 10.1109/FMPC.1992.234885
Kim Mills, Michael Vinson, Gang Cheng
A set of stock option pricing models is implemented on the Connection Machine-2 and the DECmpp-12000 to compare model prices and historical market data. Improved models which incorporate stochastic volatility with American call generally have smaller pricing errors than simpler models which are based on constant volatility and European call. In a refinement of the comparison between model and market prices, a figure of merit based on the bid/ask spread in the market and the use of optimization techniques for model parameter estimation, are evaluated. Optimization appears to hold great promise for improving the accuracy of existing pricing models, especially for stocks which are difficult to price with conventional models.<>
{"title":"A large scale comparison of option pricing models with historical market data","authors":"Kim Mills, Michael Vinson, Gang Cheng","doi":"10.1109/FMPC.1992.234885","DOIUrl":"https://doi.org/10.1109/FMPC.1992.234885","url":null,"abstract":"A set of stock option pricing models is implemented on the Connection Machine-2 and the DECmpp-12000 to compare model prices and historical market data. Improved models which incorporate stochastic volatility with American call generally have smaller pricing errors than simpler models which are based on constant volatility and European call. In a refinement of the comparison between model and market prices, a figure of merit based on the bid/ask spread in the market and the use of optimization techniques for model parameter estimation, are evaluated. Optimization appears to hold great promise for improving the accuracy of existing pricing models, especially for stocks which are difficult to price with conventional models.<<ETX>>","PeriodicalId":117789,"journal":{"name":"[Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation","volume":"106 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133069340","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1992-10-19DOI: 10.1109/FMPC.1992.234898
Jaeyoung Choi, J. Dongarra, R. Pozo, D. Walker
The authors describe ScaLAPACK, a distributed memory version of the LAPACK software package for dense and banded matrix computations. Key design features are the use of distributed versions of the Level 3 BLAS as building blocks, and an object-oriented interface to the library routines. The square block scattered decomposition is described. The implementation of a distributed memory version of the right-looking LU factorization algorithm on the Intel Delta multicomputer is discussed, and performance results are presented that demonstrate the scalability of the algorithm.<>
{"title":"ScaLAPACK: a scalable linear algebra library for distributed memory concurrent computers","authors":"Jaeyoung Choi, J. Dongarra, R. Pozo, D. Walker","doi":"10.1109/FMPC.1992.234898","DOIUrl":"https://doi.org/10.1109/FMPC.1992.234898","url":null,"abstract":"The authors describe ScaLAPACK, a distributed memory version of the LAPACK software package for dense and banded matrix computations. Key design features are the use of distributed versions of the Level 3 BLAS as building blocks, and an object-oriented interface to the library routines. The square block scattered decomposition is described. The implementation of a distributed memory version of the right-looking LU factorization algorithm on the Intel Delta multicomputer is discussed, and performance results are presented that demonstrate the scalability of the algorithm.<<ETX>>","PeriodicalId":117789,"journal":{"name":"[Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130184556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}