避免通信的算法:并行系统的分析和代码生成

K. Murthy, J. Mellor-Crummey
{"title":"避免通信的算法:并行系统的分析和代码生成","authors":"K. Murthy, J. Mellor-Crummey","doi":"10.1109/PACT.2015.41","DOIUrl":null,"url":null,"abstract":"Data movement is a critical bottleneck for future generations of parallel systems. The class of .5D communication-avoiding algorithms were developed to address this bottleneck. These algorithms reduce communication and provide strong scaling in both time and energy. As a firststep towards automating the development of communication-avoiding-libraries, we developed the Maunam compiler. Maunam generates efficient parallel code from a high-level, global view sketch of .5D algorithms that are expressed using symbolic data sizes and numbers of processors. It supports the expression of data movement and communication through-high-level global operations such as TILT and CSHIFT as well as through element-wise copy operations. With the latter, wrap around communication patterns can also be achieved using subscripts based on modulo operations. Maunam employs polyhedral analysis to reason about communication and computation present in a .5D algorithm. After partitioning data and computation, it inserts point-to-point-and collective communication as needed. Maunam also analyzes data dependence patterns and data layouts to identify reductions over processor subsets. Maunam-generated Fortran+MPI code for 2.5D matrix multiplication running on 4096 cores of a Cray XC30 super computer achieves 59 TFlops/s (76% of the machine peak). Our generated parallel code achieves 91% of the performance of a hand-coded version.","PeriodicalId":385398,"journal":{"name":"2015 International Conference on Parallel Architecture and Compilation (PACT)","volume":"88 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Communication Avoiding Algorithms: Analysis and Code Generation for Parallel Systems\",\"authors\":\"K. Murthy, J. Mellor-Crummey\",\"doi\":\"10.1109/PACT.2015.41\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Data movement is a critical bottleneck for future generations of parallel systems. The class of .5D communication-avoiding algorithms were developed to address this bottleneck. These algorithms reduce communication and provide strong scaling in both time and energy. As a firststep towards automating the development of communication-avoiding-libraries, we developed the Maunam compiler. Maunam generates efficient parallel code from a high-level, global view sketch of .5D algorithms that are expressed using symbolic data sizes and numbers of processors. It supports the expression of data movement and communication through-high-level global operations such as TILT and CSHIFT as well as through element-wise copy operations. With the latter, wrap around communication patterns can also be achieved using subscripts based on modulo operations. Maunam employs polyhedral analysis to reason about communication and computation present in a .5D algorithm. After partitioning data and computation, it inserts point-to-point-and collective communication as needed. Maunam also analyzes data dependence patterns and data layouts to identify reductions over processor subsets. Maunam-generated Fortran+MPI code for 2.5D matrix multiplication running on 4096 cores of a Cray XC30 super computer achieves 59 TFlops/s (76% of the machine peak). Our generated parallel code achieves 91% of the performance of a hand-coded version.\",\"PeriodicalId\":385398,\"journal\":{\"name\":\"2015 International Conference on Parallel Architecture and Compilation (PACT)\",\"volume\":\"88 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-10-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 International Conference on Parallel Architecture and Compilation (PACT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/PACT.2015.41\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 International Conference on Parallel Architecture and Compilation (PACT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PACT.2015.41","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

摘要

数据移动是未来几代并行系统的关键瓶颈。为了解决这一瓶颈,开发了一类0.5 d通信避免算法。这些算法减少了通信,并在时间和精力上提供了强大的可伸缩性。作为自动化开发避免通信库的第一步,我们开发了Maunam编译器。Maunam通过使用符号数据大小和处理器数量表示的。5 d算法的高级全局视图草图生成高效的并行代码。它支持通过高级全局操作(如TILT和CSHIFT)以及通过元素复制操作来表达数据移动和通信。对于后者,还可以使用基于模操作的下标来实现封装通信模式。Maunam使用多面体分析来解释0.5 d算法中存在的通信和计算。在划分数据和计算后,根据需要插入点对点和集体通信。Maunam还分析了数据依赖模式和数据布局,以确定处理器子集的减少。maunam生成的用于2.5D矩阵乘法的Fortran+MPI代码在Cray XC30超级计算机的4096个核上运行,达到59 TFlops/s(机器峰值的76%)。我们生成的并行代码达到了手工编码版本的91%的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Communication Avoiding Algorithms: Analysis and Code Generation for Parallel Systems
Data movement is a critical bottleneck for future generations of parallel systems. The class of .5D communication-avoiding algorithms were developed to address this bottleneck. These algorithms reduce communication and provide strong scaling in both time and energy. As a firststep towards automating the development of communication-avoiding-libraries, we developed the Maunam compiler. Maunam generates efficient parallel code from a high-level, global view sketch of .5D algorithms that are expressed using symbolic data sizes and numbers of processors. It supports the expression of data movement and communication through-high-level global operations such as TILT and CSHIFT as well as through element-wise copy operations. With the latter, wrap around communication patterns can also be achieved using subscripts based on modulo operations. Maunam employs polyhedral analysis to reason about communication and computation present in a .5D algorithm. After partitioning data and computation, it inserts point-to-point-and collective communication as needed. Maunam also analyzes data dependence patterns and data layouts to identify reductions over processor subsets. Maunam-generated Fortran+MPI code for 2.5D matrix multiplication running on 4096 cores of a Cray XC30 super computer achieves 59 TFlops/s (76% of the machine peak). Our generated parallel code achieves 91% of the performance of a hand-coded version.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Storage Consolidation on SSDs: Not Always a Panacea, but Can We Ease the Pain? AREP: Adaptive Resource Efficient Prefetching for Maximizing Multicore Performance NVMMU: A Non-volatile Memory Management Unit for Heterogeneous GPU-SSD Architectures Scalable Task Scheduling and Synchronization Using Hierarchical Effects Scalable SIMD-Efficient Graph Processing on GPUs
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1