在BBN TC2000并行超级计算机上实现完美的ARC2D基准

The Sixth Distributed Memory Computing Conference, 1991. Proceedings Pub Date : 1991-04-28 DOI:10.1109/DMCC.1991.633200

S. Breit

{"title":"在BBN TC2000并行超级计算机上实现完美的ARC2D基准","authors":"S. Breit","doi":"10.1109/DMCC.1991.633200","DOIUrl":null,"url":null,"abstract":"The TC.2000 is a MIMD parallel processor wi,th memory that is physically distributed memory, but logically shared. Interprocessor covnmunication, and therefore access to shared memory, is sufficiently fast that most applications can be ported to the TC.2000 without rewriting the code from scratch. This paper shows how this was done for the Perfect ARC'2D benchmark. The code was first restructured by changing the order of subroutine calls so that interprocessor communication would be reduced to the equivalent of three full transposes ofthe data per iteration. The parallel implementation was then completed by inserting shared data declarations and parallel extensions provided by the TC.2000 Fortran language. Thi:F approach was easier to implement than a domain decomposition technique, but requires more interprocessor communication. It is feasible only (because of the TC.2000'~ highspeed interprocessor communications network. References to shared memory take about 25% of the totai execution time for the parallel version of ARC2D. an acceptable amount considering the code did not have to be completely rewritten. High parallel efficiency was obtained using up","PeriodicalId":313314,"journal":{"name":"The Sixth Distributed Memory Computing Conference, 1991. Proceedings","volume":"58 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1991-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Implementing the Perfect ARC2D Benchmark on the BBN TC2000 Parallel Supercomputer\",\"authors\":\"S. Breit\",\"doi\":\"10.1109/DMCC.1991.633200\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The TC.2000 is a MIMD parallel processor wi,th memory that is physically distributed memory, but logically shared. Interprocessor covnmunication, and therefore access to shared memory, is sufficiently fast that most applications can be ported to the TC.2000 without rewriting the code from scratch. This paper shows how this was done for the Perfect ARC'2D benchmark. The code was first restructured by changing the order of subroutine calls so that interprocessor communication would be reduced to the equivalent of three full transposes ofthe data per iteration. The parallel implementation was then completed by inserting shared data declarations and parallel extensions provided by the TC.2000 Fortran language. Thi:F approach was easier to implement than a domain decomposition technique, but requires more interprocessor communication. It is feasible only (because of the TC.2000'~ highspeed interprocessor communications network. References to shared memory take about 25% of the totai execution time for the parallel version of ARC2D. an acceptable amount considering the code did not have to be completely rewritten. High parallel efficiency was obtained using up\",\"PeriodicalId\":313314,\"journal\":{\"name\":\"The Sixth Distributed Memory Computing Conference, 1991. Proceedings\",\"volume\":\"58 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1991-04-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"The Sixth Distributed Memory Computing Conference, 1991. Proceedings\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DMCC.1991.633200\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Sixth Distributed Memory Computing Conference, 1991. Proceedings","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DMCC.1991.633200","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

TC.2000是一种MIMD并行处理器，其内存在物理上是分布式的，但在逻辑上是共享的。处理器间通信以及对共享内存的访问速度足够快，因此大多数应用程序都可以移植到TC.2000上，而无需从头重写代码。本文将展示如何在《Perfect ARC》的2D基准测试中实现这一点。代码首先通过改变子程序调用的顺序进行重组，这样处理器间的通信将减少到相当于每次迭代三次完整的数据转置。然后通过插入共享数据声明和由TC.2000 Fortran语言提供的并行扩展来完成并行实现。这种方法比域分解技术更容易实现，但需要更多的处理器间通信。由于有TC.2000的高速处理器间通信网络，这是可行的。对于并行版本的ARC2D，对共享内存的引用大约占用总执行时间的25%。考虑到代码不必完全重写，这是一个可接受的数量。利用up获得了较高的并行效率

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Implementing the Perfect ARC2D Benchmark on the BBN TC2000 Parallel Supercomputer

The TC.2000 is a MIMD parallel processor wi,th memory that is physically distributed memory, but logically shared. Interprocessor covnmunication, and therefore access to shared memory, is sufficiently fast that most applications can be ported to the TC.2000 without rewriting the code from scratch. This paper shows how this was done for the Perfect ARC'2D benchmark. The code was first restructured by changing the order of subroutine calls so that interprocessor communication would be reduced to the equivalent of three full transposes ofthe data per iteration. The parallel implementation was then completed by inserting shared data declarations and parallel extensions provided by the TC.2000 Fortran language. Thi:F approach was easier to implement than a domain decomposition technique, but requires more interprocessor communication. It is feasible only (because of the TC.2000'~ highspeed interprocessor communications network. References to shared memory take about 25% of the totai execution time for the parallel version of ARC2D. an acceptable amount considering the code did not have to be completely rewritten. High parallel efficiency was obtained using up

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

The Sixth Distributed Memory Computing Conference, 1991. Proceedings

自引率

0.00%

发文量

期刊最新文献

Scalable Performance Environments for Parallel Systems Using Spanning-Trees for Balancing Dynamic Load on Multiprocessors Optimal Total Exchange on an SIMD Distributed-Memory Hypercube Structured Parallel Programming on Multicomputers Parallel Solutions to the Phase Problem in X-Ray Crystallography: An Update