在BBN TC2000并行超级计算机上实现完美的ARC2D基准

S. Breit
{"title":"在BBN TC2000并行超级计算机上实现完美的ARC2D基准","authors":"S. Breit","doi":"10.1109/DMCC.1991.633200","DOIUrl":null,"url":null,"abstract":"The TC.2000 is a MIMD parallel processor wi,th memory that is physically distributed memory, but logically shared. Interprocessor covnmunication, and therefore access to shared memory, is sufficiently fast that most applications can be ported to the TC.2000 without rewriting the code from scratch. This paper shows how this was done for the Perfect ARC'2D benchmark. The code was first restructured by changing the order of subroutine calls so that interprocessor communication would be reduced to the equivalent of three full transposes ofthe data per iteration. The parallel implementation was then completed by inserting shared data declarations and parallel extensions provided by the TC.2000 Fortran language. Thi:F approach was easier to implement than a domain decomposition technique, but requires more interprocessor communication. It is feasible only (because of the TC.2000'~ highspeed interprocessor communications network. References to shared memory take about 25% of the totai execution time for the parallel version of ARC2D. an acceptable amount considering the code did not have to be completely rewritten. High parallel efficiency was obtained using up","PeriodicalId":313314,"journal":{"name":"The Sixth Distributed Memory Computing Conference, 1991. Proceedings","volume":"58 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1991-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Implementing the Perfect ARC2D Benchmark on the BBN TC2000 Parallel Supercomputer\",\"authors\":\"S. Breit\",\"doi\":\"10.1109/DMCC.1991.633200\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The TC.2000 is a MIMD parallel processor wi,th memory that is physically distributed memory, but logically shared. Interprocessor covnmunication, and therefore access to shared memory, is sufficiently fast that most applications can be ported to the TC.2000 without rewriting the code from scratch. This paper shows how this was done for the Perfect ARC'2D benchmark. The code was first restructured by changing the order of subroutine calls so that interprocessor communication would be reduced to the equivalent of three full transposes ofthe data per iteration. The parallel implementation was then completed by inserting shared data declarations and parallel extensions provided by the TC.2000 Fortran language. Thi:F approach was easier to implement than a domain decomposition technique, but requires more interprocessor communication. It is feasible only (because of the TC.2000'~ highspeed interprocessor communications network. References to shared memory take about 25% of the totai execution time for the parallel version of ARC2D. an acceptable amount considering the code did not have to be completely rewritten. High parallel efficiency was obtained using up\",\"PeriodicalId\":313314,\"journal\":{\"name\":\"The Sixth Distributed Memory Computing Conference, 1991. Proceedings\",\"volume\":\"58 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1991-04-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"The Sixth Distributed Memory Computing Conference, 1991. Proceedings\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DMCC.1991.633200\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Sixth Distributed Memory Computing Conference, 1991. Proceedings","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DMCC.1991.633200","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

TC.2000是一种MIMD并行处理器,其内存在物理上是分布式的,但在逻辑上是共享的。处理器间通信以及对共享内存的访问速度足够快,因此大多数应用程序都可以移植到TC.2000上,而无需从头重写代码。本文将展示如何在《Perfect ARC》的2D基准测试中实现这一点。代码首先通过改变子程序调用的顺序进行重组,这样处理器间的通信将减少到相当于每次迭代三次完整的数据转置。然后通过插入共享数据声明和由TC.2000 Fortran语言提供的并行扩展来完成并行实现。这种方法比域分解技术更容易实现,但需要更多的处理器间通信。由于有TC.2000的高速处理器间通信网络,这是可行的。对于并行版本的ARC2D,对共享内存的引用大约占用总执行时间的25%。考虑到代码不必完全重写,这是一个可接受的数量。利用up获得了较高的并行效率
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Implementing the Perfect ARC2D Benchmark on the BBN TC2000 Parallel Supercomputer
The TC.2000 is a MIMD parallel processor wi,th memory that is physically distributed memory, but logically shared. Interprocessor covnmunication, and therefore access to shared memory, is sufficiently fast that most applications can be ported to the TC.2000 without rewriting the code from scratch. This paper shows how this was done for the Perfect ARC'2D benchmark. The code was first restructured by changing the order of subroutine calls so that interprocessor communication would be reduced to the equivalent of three full transposes ofthe data per iteration. The parallel implementation was then completed by inserting shared data declarations and parallel extensions provided by the TC.2000 Fortran language. Thi:F approach was easier to implement than a domain decomposition technique, but requires more interprocessor communication. It is feasible only (because of the TC.2000'~ highspeed interprocessor communications network. References to shared memory take about 25% of the totai execution time for the parallel version of ARC2D. an acceptable amount considering the code did not have to be completely rewritten. High parallel efficiency was obtained using up
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Scalable Performance Environments for Parallel Systems Using Spanning-Trees for Balancing Dynamic Load on Multiprocessors Optimal Total Exchange on an SIMD Distributed-Memory Hypercube Structured Parallel Programming on Multicomputers Parallel Solutions to the Phase Problem in X-Ray Crystallography: An Update
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1