首页 > 最新文献

[Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation最新文献

英文 中文
A compiler for a massively parallel distributed memory MIMD computer 用于大规模并行分布式内存MIMD计算机的编译器
G. Sabot
The author describes the techniques that are used by the CM Compiler Engine to map the fine-grained array parallelism of languages such as Fortan 90 and C onto the Connection Machine (CM) architectures. The same compiler is used for node-level programming of the CM-5, for global programming of the CM-5, and for global programming of the SIMD (single-instruction multiple-data) CM-2. A new compiler phase is used to generate two classes of output code: code for a scalar control processor, which executes SPARC assembler, and code aimed at a model of the CM-5's parallel-processing elements. The model is embodied in a new RISC (reduced instruction set computer)-like vector instruction set called PEAC. The control program distributes parallel data at runtime among the processor nodes of the target machine. Each of these nodes is itself superpipelined and superscalar. An innovative scheduler overlaps the execution of multiple PEAC operations, while conventional vector processing techniques keep the pipelines filled.<>
作者描述了CM编译器引擎所使用的技术,这些技术将Fortan 90和C等语言的细粒度数组并行性映射到连接机(CM)体系结构上。CM-5的节点级编程、CM-5的全局编程和SIMD(单指令多数据)CM-2的全局编程都使用相同的编译器。新的编译阶段用于生成两类输出代码:用于执行SPARC汇编程序的标量控制处理器的代码,以及针对CM-5并行处理元素模型的代码。该模型体现在一种新的类似RISC(精简指令集计算机)的向量指令集PEAC中。控制程序在运行时在目标机的处理器节点之间分配并行数据。每个节点本身都是超管道和超标量。一种创新的调度器可以重叠多个PEAC操作的执行,而传统的矢量处理技术可以保持管道的填充
{"title":"A compiler for a massively parallel distributed memory MIMD computer","authors":"G. Sabot","doi":"10.1109/FMPC.1992.234910","DOIUrl":"https://doi.org/10.1109/FMPC.1992.234910","url":null,"abstract":"The author describes the techniques that are used by the CM Compiler Engine to map the fine-grained array parallelism of languages such as Fortan 90 and C onto the Connection Machine (CM) architectures. The same compiler is used for node-level programming of the CM-5, for global programming of the CM-5, and for global programming of the SIMD (single-instruction multiple-data) CM-2. A new compiler phase is used to generate two classes of output code: code for a scalar control processor, which executes SPARC assembler, and code aimed at a model of the CM-5's parallel-processing elements. The model is embodied in a new RISC (reduced instruction set computer)-like vector instruction set called PEAC. The control program distributes parallel data at runtime among the processor nodes of the target machine. Each of these nodes is itself superpipelined and superscalar. An innovative scheduler overlaps the execution of multiple PEAC operations, while conventional vector processing techniques keep the pipelines filled.<<ETX>>","PeriodicalId":117789,"journal":{"name":"[Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132672830","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
A Grimm collection of MIMD fairy tales 格林童话集
T. Blank, J. Nickolls
The authors present two tales about massively parallel processors: 'Who is Fairest of Us All?' and 'The SPMD Path.' With a twist of humor, the tales discuss single-instruction multiple-data systems (SIMD), multiple-instruction multiple-data (MIMD) systems, differences, and the single program multiple data (SPMD) programming model. The first tale introduces autonomous SIMD (ASIMD), and then looks at the flexibility, programmability, cost, and effectiveness of MIMD and ASIMD systems. It is shown that ASIMD systems have the flexibility to solve real applications cost-effectively. The second tale describes the simple path that SPMD provides for programming, and why an ASIMD machine works well.<>
作者介绍了两个关于大规模并行处理器的故事:“谁是我们所有人中最公平的?”和“SPMD路径”。这些故事以一种幽默的方式讨论了单指令多数据系统(SIMD)、多指令多数据系统(MIMD)、差异和单程序多数据编程模型。第一个故事介绍了自主SIMD (ASIMD),然后介绍了MIMD和ASIMD系统的灵活性、可编程性、成本和有效性。结果表明,ASIMD系统具有较强的灵活性,能够经济有效地解决实际应用问题。第二个故事描述了SPMD为编程提供的简单路径,以及为什么ASIMD机器工作得很好
{"title":"A Grimm collection of MIMD fairy tales","authors":"T. Blank, J. Nickolls","doi":"10.1109/FMPC.1992.234881","DOIUrl":"https://doi.org/10.1109/FMPC.1992.234881","url":null,"abstract":"The authors present two tales about massively parallel processors: 'Who is Fairest of Us All?' and 'The SPMD Path.' With a twist of humor, the tales discuss single-instruction multiple-data systems (SIMD), multiple-instruction multiple-data (MIMD) systems, differences, and the single program multiple data (SPMD) programming model. The first tale introduces autonomous SIMD (ASIMD), and then looks at the flexibility, programmability, cost, and effectiveness of MIMD and ASIMD systems. It is shown that ASIMD systems have the flexibility to solve real applications cost-effectively. The second tale describes the simple path that SPMD provides for programming, and why an ASIMD machine works well.<<ETX>>","PeriodicalId":117789,"journal":{"name":"[Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation","volume":"57 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120853923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Architecture independent analysis of sorting and list ranking on the hierarchical PRAM model 在分层PRAM模型上对排序和列表排序进行体系结构独立分析
T. Heywood, S. Ranka
The authors consider the performance of sorting and list ranking on the hierarchical parallel random access machine (H-PRAM), a model of computation which represents general degrees of locality (neighborhoods of activity), considering communication and synchronization simultaneously. The sorting result gives a significant improvement over that for the LPRAM (local-memory PRAM, i.e. unit-size neighborhoods), matches the best known hypercube algorithms when the H-PRAM's latency parameter l(P) is set to log P, and matches the best possible mesh algorithm when l(P)= square root P. The list ranking algorithm demonstrates fundamental limitations of the H-PRAM for nonoblivious problems which have linear-time sequential algorithms.<>
本文研究了层次并行随机存取机(H-PRAM)的排序和列表排序性能,H-PRAM是一种表示一般局部性(活动邻域)的计算模型,同时考虑了通信和同步。排序结果比LPRAM(局部内存PRAM,即单位大小的邻域)的排序结果有了显著的改进,当H-PRAM的延迟参数l(P)被设置为log P时,与最著名的超立方体算法相匹配,当l(P)=平方根P时,与最可能的网格算法相匹配。列表排序算法显示了H-PRAM对于具有线性时间序列算法的非遗忘问题的基本局限性。
{"title":"Architecture independent analysis of sorting and list ranking on the hierarchical PRAM model","authors":"T. Heywood, S. Ranka","doi":"10.1109/FMPC.1992.234932","DOIUrl":"https://doi.org/10.1109/FMPC.1992.234932","url":null,"abstract":"The authors consider the performance of sorting and list ranking on the hierarchical parallel random access machine (H-PRAM), a model of computation which represents general degrees of locality (neighborhoods of activity), considering communication and synchronization simultaneously. The sorting result gives a significant improvement over that for the LPRAM (local-memory PRAM, i.e. unit-size neighborhoods), matches the best known hypercube algorithms when the H-PRAM's latency parameter l(P) is set to log P, and matches the best possible mesh algorithm when l(P)= square root P. The list ranking algorithm demonstrates fundamental limitations of the H-PRAM for nonoblivious problems which have linear-time sequential algorithms.<<ETX>>","PeriodicalId":117789,"journal":{"name":"[Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116359738","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Parallel pulse correlation and geolocation 平行脉冲相关和地理定位
D.K. Krecker, W. Mitchell
The identification and location of ground-based radars via orbiting receivers require the correlation of pulses, the determination of time differences of arrival, and geolocation. Data rates in emitter-rich environments would swamp single-CPU processors performing this operation. The authors present an innovative parallel algorithm developed specifically for this application on massively parallel computers. The algorithm is based on the parallel computation and analysis of a matrix containing the differences in the time of arrival of all pulses received in a time window, and on the parallel proof/disproof of hypothesized emitter locations. Output contains the number of emitters and their location and PRI (pulse repetition interval) sequence. The algorithm was tested on a 16 K processor Connection Machine.<>
通过轨道接收机对地面雷达进行识别和定位,需要进行脉冲的相关、到达时间差的确定和地理定位。在发射器较多的环境中,执行此操作的单cpu处理器将无法承受数据速率。作者提出了一种创新的并行算法,专门为大规模并行计算机上的这种应用开发。该算法基于对一个时间窗口内接收到的所有脉冲到达时间差异矩阵的并行计算和分析,以及对假设的发射极位置的并行证明/反证。输出包含发射器的数量和它们的位置和PRI(脉冲重复间隔)序列。该算法在一台16k处理器的连接机上进行了测试。
{"title":"Parallel pulse correlation and geolocation","authors":"D.K. Krecker, W. Mitchell","doi":"10.1109/FMPC.1992.234929","DOIUrl":"https://doi.org/10.1109/FMPC.1992.234929","url":null,"abstract":"The identification and location of ground-based radars via orbiting receivers require the correlation of pulses, the determination of time differences of arrival, and geolocation. Data rates in emitter-rich environments would swamp single-CPU processors performing this operation. The authors present an innovative parallel algorithm developed specifically for this application on massively parallel computers. The algorithm is based on the parallel computation and analysis of a matrix containing the differences in the time of arrival of all pulses received in a time window, and on the parallel proof/disproof of hypothesized emitter locations. Output contains the number of emitters and their location and PRI (pulse repetition interval) sequence. The algorithm was tested on a 16 K processor Connection Machine.<<ETX>>","PeriodicalId":117789,"journal":{"name":"[Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation","volume":"120 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126104879","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Massively parallel computers: why not parallel computers for the masses? 大规模并行计算机:为什么不为大众提供并行计算机?
G. Bell
The developments in high-performance computers towards achieving the goal of a teraflops supercomputer that would operate at a peak speed of 10/sup 12/ floating-point operations per second are reviewed. The net result of the quest for parallelism as chronicled by the Gordon Bell Prize is that applications evolved 115% per year and will most likely achieve 1 teraflop in 1995. The physical characteristics of supercomputing alternatives available in 1992 are described. The progress of CMOS microprocessor technology to teraflop speeds is discussed. It is argued that the mainline general purpose computers will continue to be microprocessors in three forms: supercomputers, mainframes, and scalable MPs. The current scalable, multicomputers will all evolve and become multiprocessors, but with limited coherent memories in their next generation. It is also argued that the cost and time to rewrite major applications for one-of-a-kind machines is sufficiently large to make them uneconomical.<>
回顾了高性能计算机在实现每秒10/sup / 12/浮点运算峰值速度的teraflops超级计算机目标方面的发展。戈登·贝尔奖(Gordon Bell Prize)记录的对并行性的追求的最终结果是,应用程序每年发展115%,最有可能在1995年达到每秒1万亿次浮点运算。描述了1992年可用的超级计算替代方案的物理特性。讨论了CMOS微处理器技术在万亿次浮点运算速度方面的进展。有人认为,主流的通用计算机将继续是三种形式的微处理器:超级计算机、大型机和可扩展的MPs。目前可扩展的多计算机都将发展成为多处理器,但下一代的连贯存储器有限。也有人认为,为一种机器重写主要应用程序的成本和时间足够大,使它们不经济。
{"title":"Massively parallel computers: why not parallel computers for the masses?","authors":"G. Bell","doi":"10.1109/FMPC.1992.234946","DOIUrl":"https://doi.org/10.1109/FMPC.1992.234946","url":null,"abstract":"The developments in high-performance computers towards achieving the goal of a teraflops supercomputer that would operate at a peak speed of 10/sup 12/ floating-point operations per second are reviewed. The net result of the quest for parallelism as chronicled by the Gordon Bell Prize is that applications evolved 115% per year and will most likely achieve 1 teraflop in 1995. The physical characteristics of supercomputing alternatives available in 1992 are described. The progress of CMOS microprocessor technology to teraflop speeds is discussed. It is argued that the mainline general purpose computers will continue to be microprocessors in three forms: supercomputers, mainframes, and scalable MPs. The current scalable, multicomputers will all evolve and become multiprocessors, but with limited coherent memories in their next generation. It is also argued that the cost and time to rewrite major applications for one-of-a-kind machines is sufficiently large to make them uneconomical.<<ETX>>","PeriodicalId":117789,"journal":{"name":"[Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122063427","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Hyperbanyan networks: a new class of networks for distributed-memory multiprocessors Hyperbanyan网络:分布式内存多处理器的一类新网络
Clayton Ferner, K. Y. Lee
A new class of connection topologies for distributed-memory multiprocessors, hyperbanyan networks, is introduced. A hyperbanyan is a combination of the topological designs of a banyan and the hypertree networks. Since the hypertree combines the advantages of the binary tree and the hypercube, a hyperbanyan has the features of a binary tree, a hypercube, and a banyan. The hyperbanyans have a fixed degree of five, and the diameter of an (n stage*2/sup n-1/ nodes/stage) hyperbanyan is 2(n-1). A routing algorithm which is close to optimal is presented.<>
介绍了一种用于分布式内存多处理器的新型连接拓扑——超榕树网络。超榕树是榕树的拓扑设计和超树网络的结合。由于超树结合了二叉树和超立方体的优点,因此超榕树具有二叉树、超立方体和榕树的特征。超榕树的固定度为5,一个(n阶段*2/sup n-1/节点/阶段)超榕树的直径为2(n-1)。提出了一种接近最优的路由算法
{"title":"Hyperbanyan networks: a new class of networks for distributed-memory multiprocessors","authors":"Clayton Ferner, K. Y. Lee","doi":"10.1109/FMPC.1992.234951","DOIUrl":"https://doi.org/10.1109/FMPC.1992.234951","url":null,"abstract":"A new class of connection topologies for distributed-memory multiprocessors, hyperbanyan networks, is introduced. A hyperbanyan is a combination of the topological designs of a banyan and the hypertree networks. Since the hypertree combines the advantages of the binary tree and the hypercube, a hyperbanyan has the features of a binary tree, a hypercube, and a banyan. The hyperbanyans have a fixed degree of five, and the diameter of an (n stage*2/sup n-1/ nodes/stage) hyperbanyan is 2(n-1). A routing algorithm which is close to optimal is presented.<<ETX>>","PeriodicalId":117789,"journal":{"name":"[Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126990433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Dynamic precision iterative algorithms 动态精度迭代算法
D. Kramer, I. Scherson
The authors address the use of DP (dynamic precision) in fixed point iterative numerical algorithms. These algorithms are used in a wide range of numerically intensive scientific applications. One such algorithm, Muller's method, detects complex roots of an arbitrary function. This algorithm was implemented in DP on various architectures, including a MasPar MP-1 massively parallel processor and a Cray Y-MP vector processor. The results show that the use of DP can lead to a significant speedup of iterative algorithms on multiple-range architectures.<>
讨论了动态精度在不动点迭代数值算法中的应用。这些算法广泛应用于数值密集型的科学应用中。穆勒方法就是这样一种算法,它可以检测任意函数的复根。该算法在包括MasPar MP-1大规模并行处理器和Cray Y-MP矢量处理器在内的各种体系结构上的DP中实现。结果表明,在多范围架构下,使用DP可以显著提高迭代算法的速度。
{"title":"Dynamic precision iterative algorithms","authors":"D. Kramer, I. Scherson","doi":"10.1109/FMPC.1992.234930","DOIUrl":"https://doi.org/10.1109/FMPC.1992.234930","url":null,"abstract":"The authors address the use of DP (dynamic precision) in fixed point iterative numerical algorithms. These algorithms are used in a wide range of numerically intensive scientific applications. One such algorithm, Muller's method, detects complex roots of an arbitrary function. This algorithm was implemented in DP on various architectures, including a MasPar MP-1 massively parallel processor and a Cray Y-MP vector processor. The results show that the use of DP can lead to a significant speedup of iterative algorithms on multiple-range architectures.<<ETX>>","PeriodicalId":117789,"journal":{"name":"[Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation","volume":"2016 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128113770","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
A fast algorithm for computing histograms on a reconfigurable mesh 一种计算可重构网格上直方图的快速算法
J. Jang, H. Park, V. Prasanna
The authors present fast parallel algorithms for computing the histogram on PARBUS and RMESH models. Compared with the approach of J. Jeng and S. Sahni (1992), the proposed algorithm improves the time complexity by using a constant amount of memory in each processing element. In the histogram modification algorithm, the entire range of h is considered. The connections used by the proposed algorithm on the PARBUS model are same as those allowed in the MRN model. Thus, this algorithm runs on this model as well. The results obtained imply that the number of 1's in a N*N 0/1 table can be counted in O(log* N) time on an N*N reconfigurable mesh and in O(log log N) time on an N*N RMESH.<>
提出了在PARBUS和RMESH模型上计算直方图的快速并行算法。与J. Jeng和S. Sahni(1992)的方法相比,该算法通过在每个处理元素中使用恒定的内存量来提高时间复杂度。在直方图修改算法中,考虑了h的整个范围。所提出的算法在PARBUS模型上使用的连接与MRN模型中允许的连接相同。因此,该算法也在该模型上运行。得到的结果表明,在N*N可重构网格上,N*N 0/1表中1的个数可以在O(log* N)时间内计算出来,在N*N可重构网格上,可以在O(log log N)时间内计算出来,在N*N可重构网格上,可以在O(log log N)时间内计算出来。
{"title":"A fast algorithm for computing histograms on a reconfigurable mesh","authors":"J. Jang, H. Park, V. Prasanna","doi":"10.1109/FMPC.1992.234952","DOIUrl":"https://doi.org/10.1109/FMPC.1992.234952","url":null,"abstract":"The authors present fast parallel algorithms for computing the histogram on PARBUS and RMESH models. Compared with the approach of J. Jeng and S. Sahni (1992), the proposed algorithm improves the time complexity by using a constant amount of memory in each processing element. In the histogram modification algorithm, the entire range of h is considered. The connections used by the proposed algorithm on the PARBUS model are same as those allowed in the MRN model. Thus, this algorithm runs on this model as well. The results obtained imply that the number of 1's in a N*N 0/1 table can be counted in O(log* N) time on an N*N reconfigurable mesh and in O(log log N) time on an N*N RMESH.<<ETX>>","PeriodicalId":117789,"journal":{"name":"[Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation","volume":"82 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127628799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 32
Hardware support for the Seamless programming model 对无缝编程模型的硬件支持
S. Fineberg, T. Casavant, B. H. Pease
The communication latency problem is presented with special emphasis on RISC (reduced instruction set computer) based multiprocessors. An interprocessor communication model for parallel programs based on locality is presented. This model enables the programmer to manipulate locality at the language level and to take advantage of currently available system hardware to reduce latency. A hardware node architecture for a latency-tolerant RISC-based multiprocessor, called Seamless, that supports this model, is presented. The Seamless architecture includes the addition of a hardware locality manager to each processing element, as well as an integral runtime environment and compiler.<>
讨论了基于精简指令集计算机的多处理器的通信延迟问题。提出了一种基于局部性的并行程序处理器间通信模型。该模型使程序员能够在语言级别上操纵局部性,并利用当前可用的系统硬件来减少延迟。提出了一种支持该模型的基于延迟容忍risc的多处理器的硬件节点体系结构,称为Seamless。无缝架构包括为每个处理元素添加硬件位置管理器,以及一个完整的运行时环境和编译器。
{"title":"Hardware support for the Seamless programming model","authors":"S. Fineberg, T. Casavant, B. H. Pease","doi":"10.1109/FMPC.1992.234939","DOIUrl":"https://doi.org/10.1109/FMPC.1992.234939","url":null,"abstract":"The communication latency problem is presented with special emphasis on RISC (reduced instruction set computer) based multiprocessors. An interprocessor communication model for parallel programs based on locality is presented. This model enables the programmer to manipulate locality at the language level and to take advantage of currently available system hardware to reduce latency. A hardware node architecture for a latency-tolerant RISC-based multiprocessor, called Seamless, that supports this model, is presented. The Seamless architecture includes the addition of a hardware locality manager to each processing element, as well as an integral runtime environment and compiler.<<ETX>>","PeriodicalId":117789,"journal":{"name":"[Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126524415","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
ALFA: a static data flow architecture ALFA:静态数据流架构
L. Verdoscia, R. Vaccaro
The authors present the ALFA architecture, a data flow machine with 16384 functional units (FUs) grouped in 128 clusters. ALFA is based on the Backus FFP computational model and uses the static data flow execution model. This machine's behavior is deterministic and asynchronous. Consequently, after compile time, instructions and data are no longer related. In this machine, even though its behavior is deterministic, no control token is generated during the computation, but only data tokens. Furthermore, during the execution phase, no memory is required to contain the partial results exchanged among FUs. A cluster with 128 FUs has been simulated, and some results are presented.<>
作者提出了ALFA架构,一个包含16384个功能单元(FUs)的数据流机,分组在128个集群中。ALFA基于Backus FFP计算模型,采用静态数据流执行模型。这台机器的行为是确定的和异步的。因此,在编译后,指令和数据不再相关。在这台机器中,尽管它的行为是确定的,但在计算过程中不生成控制令牌,只生成数据令牌。此外,在执行阶段,不需要内存来包含在fu之间交换的部分结果。对一个有128个FUs的簇进行了仿真,并给出了一些结果。
{"title":"ALFA: a static data flow architecture","authors":"L. Verdoscia, R. Vaccaro","doi":"10.1109/FMPC.1992.234943","DOIUrl":"https://doi.org/10.1109/FMPC.1992.234943","url":null,"abstract":"The authors present the ALFA architecture, a data flow machine with 16384 functional units (FUs) grouped in 128 clusters. ALFA is based on the Backus FFP computational model and uses the static data flow execution model. This machine's behavior is deterministic and asynchronous. Consequently, after compile time, instructions and data are no longer related. In this machine, even though its behavior is deterministic, no control token is generated during the computation, but only data tokens. Furthermore, during the execution phase, no memory is required to contain the partial results exchanged among FUs. A cluster with 128 FUs has been simulated, and some results are presented.<<ETX>>","PeriodicalId":117789,"journal":{"name":"[Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126529044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
期刊
[Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1