首页 > 最新文献

[Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation最新文献

英文 中文
Information hiding in parallel programs: model and experimental evaluation on the Connection Machine 并行程序中的信息隐藏:连接机模型与实验评价
I. Yen, F. Bastani, T. Al-Marzooq
An approach for incorporating information hiding within parallel software components is developed. The loss of performance is overcome by having intracomponent encapsulation layers, massive state transition operations, multiple-entry data structures, and program transformation. The approach was experimentally evaluated for three types of objects and application programs on a Connection Machine (CM-2). The results indicate that the approach can reduce the loss of performance due to information hiding. The results indicate that there is some loss of performance for the sorted-array implementation of the set object. Also, the performance of the hash data structure was much worse than expected. Hardware message queues would greatly improve the performance.<>
提出了一种在并行软件组件中集成信息隐藏的方法。性能损失可以通过组件内封装层、大量状态转换操作、多入口数据结构和程序转换来克服。在连接机(CM-2)上对三种类型的对象和应用程序进行了实验评估。结果表明,该方法可以减少由于信息隐藏而造成的性能损失。结果表明,set对象的排序数组实现存在一些性能损失。此外,哈希数据结构的性能比预期的要差得多。硬件消息队列将极大地提高性能
{"title":"Information hiding in parallel programs: model and experimental evaluation on the Connection Machine","authors":"I. Yen, F. Bastani, T. Al-Marzooq","doi":"10.1109/FMPC.1992.234942","DOIUrl":"https://doi.org/10.1109/FMPC.1992.234942","url":null,"abstract":"An approach for incorporating information hiding within parallel software components is developed. The loss of performance is overcome by having intracomponent encapsulation layers, massive state transition operations, multiple-entry data structures, and program transformation. The approach was experimentally evaluated for three types of objects and application programs on a Connection Machine (CM-2). The results indicate that the approach can reduce the loss of performance due to information hiding. The results indicate that there is some loss of performance for the sorted-array implementation of the set object. Also, the performance of the hash data structure was much worse than expected. Hardware message queues would greatly improve the performance.<<ETX>>","PeriodicalId":117789,"journal":{"name":"[Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134208378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A routing algorithm for PEC networks PEC网络的路由算法
C.C.S. Lin, V. K. Prasanna
A routing algorithm is shown which can route in O( square root log N*2/sup square root 2logN/) steps in an N node packed exponential connections (PEC) network. It is also shown that semigroup operations can be performed in O(log N*2/sup square root 2logN/) parallel steps.<>
给出了一种在N个节点的指数连接网络中以O(平方根logN *2/sup平方根2logN/)步完成路由的算法。还证明了半群运算可以在O(log N*2/sup平方根2logN/)个并行步骤中完成。
{"title":"A routing algorithm for PEC networks","authors":"C.C.S. Lin, V. K. Prasanna","doi":"10.1109/FMPC.1992.234891","DOIUrl":"https://doi.org/10.1109/FMPC.1992.234891","url":null,"abstract":"A routing algorithm is shown which can route in O( square root log N*2/sup square root 2logN/) steps in an N node packed exponential connections (PEC) network. It is also shown that semigroup operations can be performed in O(log N*2/sup square root 2logN/) parallel steps.<<ETX>>","PeriodicalId":117789,"journal":{"name":"[Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation","volume":"147 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122687205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Single source shortest path problem on processor arrays 处理器阵列上的单源最短路径问题
P. Narayanan
Algorithms for computing the shortest paths to every vertex from a single source vertex in nonnegatively weighted graphs are examined. A conventional data parallel algorithm and a replicated data algorithm for the single-source shortest path problem are presented. Both algorithms have been implemented on a Connection Machine CM-2 and a MasPar MP-1. Analytical and experimental speedups using the data replication technique are presented.<>
研究了非负加权图中从单个源顶点到每个顶点的最短路径的计算算法。针对单源最短路径问题,提出了一种传统的数据并行算法和一种复制数据算法。这两种算法都在连接机CM-2和MasPar MP-1上实现。介绍了利用数据复制技术的分析和实验加速。
{"title":"Single source shortest path problem on processor arrays","authors":"P. Narayanan","doi":"10.1109/FMPC.1992.234924","DOIUrl":"https://doi.org/10.1109/FMPC.1992.234924","url":null,"abstract":"Algorithms for computing the shortest paths to every vertex from a single source vertex in nonnegatively weighted graphs are examined. A conventional data parallel algorithm and a replicated data algorithm for the single-source shortest path problem are presented. Both algorithms have been implemented on a Connection Machine CM-2 and a MasPar MP-1. Analytical and experimental speedups using the data replication technique are presented.<<ETX>>","PeriodicalId":117789,"journal":{"name":"[Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation","volume":"120 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124091117","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Computing parallel prefix and reduction using coterie structures 使用小窝结构计算并行前缀和约简
M. Herbordt, C. Weems
The efficient computation of region parameters in image understanding by a SIMD (single-instruction multiple-data) array requires that those regions be processed simultaneously. The difficulty is in orchestrating nonuniform data-dependent communication using only a single thread of control. The authors have found that, on reconfigurable broadcast meshes, coterie structures can be used to overcome this problem. They present a deterministic algorithm to compute parallel prefix in O(log N) communication steps for a number of real images and sketch a randomized reduction algorithm based on graph contraction that has O(log N) complexity for all images.<>
单指令多数据阵列图像理解中区域参数的高效计算要求同时处理这些区域。难点在于仅使用单个控制线程编排非统一的依赖于数据的通信。作者发现,在可重构广播网格上,可以使用小圈子结构来克服这个问题。他们提出了一种确定性算法,用于在O(log N)通信步骤中计算多个真实图像的并行前缀,并提出了一种基于图收缩的随机约简算法,该算法对所有图像具有O(log N)复杂度。
{"title":"Computing parallel prefix and reduction using coterie structures","authors":"M. Herbordt, C. Weems","doi":"10.1109/FMPC.1992.234895","DOIUrl":"https://doi.org/10.1109/FMPC.1992.234895","url":null,"abstract":"The efficient computation of region parameters in image understanding by a SIMD (single-instruction multiple-data) array requires that those regions be processed simultaneously. The difficulty is in orchestrating nonuniform data-dependent communication using only a single thread of control. The authors have found that, on reconfigurable broadcast meshes, coterie structures can be used to overcome this problem. They present a deterministic algorithm to compute parallel prefix in O(log N) communication steps for a number of real images and sketch a randomized reduction algorithm based on graph contraction that has O(log N) complexity for all images.<<ETX>>","PeriodicalId":117789,"journal":{"name":"[Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127690140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A modulo merge sorting network 一个模归并排序网络
K. Liszka, K. Batcher
The odd-even merge is a widely used and generally accepted merging network that uses O(N log/sup 2/N) comparators with O(log/sup 2/N) delay. A novel merging network is presented that generalizes the technique used in the odd-even merge. It is based on the division of the input keys by a specified modulus, not limited to 2. A special comparator is used in the final merge step that accepts m input lines and produces m sorted items, where m is the modulus selected for the merge. Alternatives are discussed that apply to the bitonic merging network.<>
奇偶合并是一种广泛使用且被普遍接受的合并网络,它使用O(N log/sup 2/N)比较器和O(log/sup 2/N)延迟。提出了一种新的合并网络,推广了奇偶合并技术。它基于输入键除以指定的模数,不限于2。在最后的合并步骤中使用一个特殊的比较器,它接受m个输入行并产生m个排序项,其中m是为合并选择的模数。讨论了适用于双onic合并网络的备选方案。
{"title":"A modulo merge sorting network","authors":"K. Liszka, K. Batcher","doi":"10.1109/FMPC.1992.234892","DOIUrl":"https://doi.org/10.1109/FMPC.1992.234892","url":null,"abstract":"The odd-even merge is a widely used and generally accepted merging network that uses O(N log/sup 2/N) comparators with O(log/sup 2/N) delay. A novel merging network is presented that generalizes the technique used in the odd-even merge. It is based on the division of the input keys by a specified modulus, not limited to 2. A special comparator is used in the final merge step that accepts m input lines and produces m sorted items, where m is the modulus selected for the merge. Alternatives are discussed that apply to the bitonic merging network.<<ETX>>","PeriodicalId":117789,"journal":{"name":"[Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation","volume":"96 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115364331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Superscalar SIMD architecture 超标量SIMD架构
D. Schimmel
Presents a parallel computer architecture which synthesizes the notions of instruction level parallelism and data parallelism. Extending the work of Siegel and others on reconfigurable SIMD/MIMD architecture, it attains most of the advantages of those machines, via selective execution of a superscalar instruction stream, while retaining most of the cost advantage of the SIMD architectural style. Furthermore, it preserves the single instruction stream framework which makes SIMD machines simpler to program. Finally, it admits the use of compiler techniques to schedule the superscalar instruction stream, allowing the automatic utilization of the latent instruction level parallelism.<>
提出了一种综合了指令级并行和数据并行概念的并行计算机体系结构。它扩展了Siegel和其他人在可重构SIMD/MIMD体系结构上的工作,通过选择性地执行一个标量指令流,获得了这些机器的大部分优点,同时保留了SIMD体系结构风格的大部分成本优势。此外,它保留了单指令流框架,使SIMD机器更容易编程。最后,它允许使用编译器技术来调度超标量指令流,从而允许自动利用潜在的指令级并行性。
{"title":"Superscalar SIMD architecture","authors":"D. Schimmel","doi":"10.1109/FMPC.1992.234917","DOIUrl":"https://doi.org/10.1109/FMPC.1992.234917","url":null,"abstract":"Presents a parallel computer architecture which synthesizes the notions of instruction level parallelism and data parallelism. Extending the work of Siegel and others on reconfigurable SIMD/MIMD architecture, it attains most of the advantages of those machines, via selective execution of a superscalar instruction stream, while retaining most of the cost advantage of the SIMD architectural style. Furthermore, it preserves the single instruction stream framework which makes SIMD machines simpler to program. Finally, it admits the use of compiler techniques to schedule the superscalar instruction stream, allowing the automatic utilization of the latent instruction level parallelism.<<ETX>>","PeriodicalId":117789,"journal":{"name":"[Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126710423","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Performance of data-parallel primitives on the EM-4 dataflow parallel supercomputer EM-4数据流并行超级计算机上数据并行原语的性能
SupercomputerAndrew Shaw, Yuetsu Kodamaz, Mitsuhisa Satoz, Shuichi Sakaiz, Yoshinori YamaguchizyMIT
The authors have implemented seven data-parallel primitives on the hybrid dataflow/von Neumann parallel computer EM-4. To evaluate the performance of these primitives, the authors compare them to identical primitives running on a CM-200 SIMD (single-instruction multiple-data) parallel computer. For integer arithmetic element-wise operations, EM-4 is faster than the CM-200 when two or more operations can be combined. For communications operations, EM-4 has significantly higher performance. EM-4's distinguishing feature in running data-parallel codes is its exceptional communications performance in terms of network bandwidth and latency, and processor/network interface. Additional special-purpose hardware for barrier synchronization and scan-like operations is not necessary. Dataflow-style token synchronization is helpful, but not necessary in implementing data-parallel primitives.<>
作者在混合数据流/冯诺依曼并行计算机EM-4上实现了7个数据并行原语。为了评估这些原语的性能,作者将它们与在CM-200 SIMD(单指令多数据)并行计算机上运行的相同原语进行了比较。对于整数算术元素操作,当可以组合两个或多个操作时,EM-4比CM-200快。对于通信操作,EM-4具有显著更高的性能。EM-4在运行数据并行代码方面的显著特点是其在网络带宽和延迟以及处理器/网络接口方面的卓越通信性能。对于屏障同步和类似扫描的操作,不需要额外的专用硬件。数据流风格的令牌同步是有帮助的,但在实现数据并行原语时不是必需的。
{"title":"Performance of data-parallel primitives on the EM-4 dataflow parallel supercomputer","authors":"SupercomputerAndrew Shaw, Yuetsu Kodamaz, Mitsuhisa Satoz, Shuichi Sakaiz, Yoshinori YamaguchizyMIT","doi":"10.1109/FMPC.1992.234945","DOIUrl":"https://doi.org/10.1109/FMPC.1992.234945","url":null,"abstract":"The authors have implemented seven data-parallel primitives on the hybrid dataflow/von Neumann parallel computer EM-4. To evaluate the performance of these primitives, the authors compare them to identical primitives running on a CM-200 SIMD (single-instruction multiple-data) parallel computer. For integer arithmetic element-wise operations, EM-4 is faster than the CM-200 when two or more operations can be combined. For communications operations, EM-4 has significantly higher performance. EM-4's distinguishing feature in running data-parallel codes is its exceptional communications performance in terms of network bandwidth and latency, and processor/network interface. Additional special-purpose hardware for barrier synchronization and scan-like operations is not necessary. Dataflow-style token synchronization is helpful, but not necessary in implementing data-parallel primitives.<<ETX>>","PeriodicalId":117789,"journal":{"name":"[Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129490774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
A parallel software package for solving linear systems 求解线性系统的并行软件包
C. D. Scarbnick, M. Chang, M. Schultz, A. B. Sherman
A problem arising in scientific computation is the solution of Ax=b, where A is a large, sparse matrix. One of the most robust algorithms for solving the above equation is the conjugate gradient method, especially when combined with a preconditioner. The authors discuss a new software package, MP-PCGPAK2, that implements a parallel version of the conjugate gradient method for MIMD (multiple-instruction multiple-data), message passing architectures. The parallel implementation is quite general and can be applied to algorithms for nonsymmetric or indefinite systems such as GMRES, Bi-CGSTAB, and QMR. The authors present results on a 1024 processor nCUBE 2, and a 128 processor iPSC/860, for positive definite, symmetric systems ranging from one million to over 11 million variables.<>
科学计算中出现的一个问题是Ax=b的解,其中A是一个大的稀疏矩阵。求解上述方程的最鲁棒算法之一是共轭梯度法,特别是当与预条件结合使用时。作者讨论了一个新的软件包MP-PCGPAK2,它实现了多指令多数据(MIMD)消息传递体系结构的共轭梯度方法的并行版本。并行实现非常通用,可以应用于非对称或不确定系统(如GMRES、Bi-CGSTAB和QMR)的算法。作者介绍了在1024处理器nCUBE 2和128处理器iPSC/860上的结果,用于正定对称系统,范围从100万到超过1100万变量。
{"title":"A parallel software package for solving linear systems","authors":"C. D. Scarbnick, M. Chang, M. Schultz, A. B. Sherman","doi":"10.1109/FMPC.1992.234934","DOIUrl":"https://doi.org/10.1109/FMPC.1992.234934","url":null,"abstract":"A problem arising in scientific computation is the solution of Ax=b, where A is a large, sparse matrix. One of the most robust algorithms for solving the above equation is the conjugate gradient method, especially when combined with a preconditioner. The authors discuss a new software package, MP-PCGPAK2, that implements a parallel version of the conjugate gradient method for MIMD (multiple-instruction multiple-data), message passing architectures. The parallel implementation is quite general and can be applied to algorithms for nonsymmetric or indefinite systems such as GMRES, Bi-CGSTAB, and QMR. The authors present results on a 1024 processor nCUBE 2, and a 128 processor iPSC/860, for positive definite, symmetric systems ranging from one million to over 11 million variables.<<ETX>>","PeriodicalId":117789,"journal":{"name":"[Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130301565","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A large scale comparison of option pricing models with historical market data 期权定价模型与历史市场数据的大规模比较
Kim Mills, Michael Vinson, Gang Cheng
A set of stock option pricing models is implemented on the Connection Machine-2 and the DECmpp-12000 to compare model prices and historical market data. Improved models which incorporate stochastic volatility with American call generally have smaller pricing errors than simpler models which are based on constant volatility and European call. In a refinement of the comparison between model and market prices, a figure of merit based on the bid/ask spread in the market and the use of optimization techniques for model parameter estimation, are evaluated. Optimization appears to hold great promise for improving the accuracy of existing pricing models, especially for stocks which are difficult to price with conventional models.<>
在Connection Machine-2和DECmpp-12000上实现了一套股票期权定价模型,以比较模型价格和历史市场数据。与基于恒定波动率和欧式看涨期权的简单模型相比,结合随机波动率和美式看涨期权的改进模型通常具有更小的定价误差。在模型和市场价格之间的比较的改进中,基于市场上的买卖价差和模型参数估计的优化技术的使用,评估了价值值。优化似乎对提高现有定价模型的准确性有很大的希望,特别是对那些难以用传统模型定价的股票。
{"title":"A large scale comparison of option pricing models with historical market data","authors":"Kim Mills, Michael Vinson, Gang Cheng","doi":"10.1109/FMPC.1992.234885","DOIUrl":"https://doi.org/10.1109/FMPC.1992.234885","url":null,"abstract":"A set of stock option pricing models is implemented on the Connection Machine-2 and the DECmpp-12000 to compare model prices and historical market data. Improved models which incorporate stochastic volatility with American call generally have smaller pricing errors than simpler models which are based on constant volatility and European call. In a refinement of the comparison between model and market prices, a figure of merit based on the bid/ask spread in the market and the use of optimization techniques for model parameter estimation, are evaluated. Optimization appears to hold great promise for improving the accuracy of existing pricing models, especially for stocks which are difficult to price with conventional models.<<ETX>>","PeriodicalId":117789,"journal":{"name":"[Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation","volume":"106 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133069340","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
ScaLAPACK: a scalable linear algebra library for distributed memory concurrent computers ScaLAPACK:一个可扩展的线性代数库,用于分布式内存并发计算机
Jaeyoung Choi, J. Dongarra, R. Pozo, D. Walker
The authors describe ScaLAPACK, a distributed memory version of the LAPACK software package for dense and banded matrix computations. Key design features are the use of distributed versions of the Level 3 BLAS as building blocks, and an object-oriented interface to the library routines. The square block scattered decomposition is described. The implementation of a distributed memory version of the right-looking LU factorization algorithm on the Intel Delta multicomputer is discussed, and performance results are presented that demonstrate the scalability of the algorithm.<>
作者描述了ScaLAPACK,一个用于密集和带状矩阵计算的LAPACK软件包的分布式内存版本。关键的设计特征是使用3级BLAS的分布式版本作为构建块,以及库例程的面向对象接口。描述了方块分散分解。讨论了在Intel Delta多计算机上实现分布式内存版本的右看LU分解算法,并给出了性能结果,证明了该算法的可扩展性。
{"title":"ScaLAPACK: a scalable linear algebra library for distributed memory concurrent computers","authors":"Jaeyoung Choi, J. Dongarra, R. Pozo, D. Walker","doi":"10.1109/FMPC.1992.234898","DOIUrl":"https://doi.org/10.1109/FMPC.1992.234898","url":null,"abstract":"The authors describe ScaLAPACK, a distributed memory version of the LAPACK software package for dense and banded matrix computations. Key design features are the use of distributed versions of the Level 3 BLAS as building blocks, and an object-oriented interface to the library routines. The square block scattered decomposition is described. The implementation of a distributed memory version of the right-looking LU factorization algorithm on the Intel Delta multicomputer is discussed, and performance results are presented that demonstrate the scalability of the algorithm.<<ETX>>","PeriodicalId":117789,"journal":{"name":"[Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130184556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 403
期刊
[Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1