新兴体系结构上零拷贝MPI数据类型处理的高效设计

J. Hashmi, S. Chakraborty, Mohammadreza Bayatpour, H. Subramoni, D. Panda
{"title":"新兴体系结构上零拷贝MPI数据类型处理的高效设计","authors":"J. Hashmi, S. Chakraborty, Mohammadreza Bayatpour, H. Subramoni, D. Panda","doi":"10.1109/IPDPS.2019.00045","DOIUrl":null,"url":null,"abstract":"Derived datatypes are commonly used in MPI applications to exchange non-contiguous data among processes. However, state-of-the-art MPI libraries do not offer efficient processing of derived datatypes and often rely on packing and unpacking the data at the sender and the receiver processes. This approach incurs the cost of extra copies and increases overall communication latency. While zero-copy communication schemes have been proposed for contiguous data, applying such techniques to non-contiguous data transfers bring forth several new challenges. In this work, we address these challenges and propose FALCON — Fast and Low-overhead Communication designs for intra-node MPI derived datatypes processing. We show that the memory layouts translation of derived datatypes introduce significant overheads in the communication path and propose novel solutions to mitigate such bottlenecks. We also find that the current MPI datatype routines cannot fully take advantage of the zero-copy mechanisms, and propose enhancements to the MPI standard to address these limitations. The experimental evaluations show that our proposed designs achieve up to 3 times improved intra-node communication latency and bandwidth over state-of-the-art MPI libraries. We also evaluate our designs with communication kernels of popular scientific applications such as MILC, WRF, NAS MG, and 3D-Stencil on three different multi-/many-core architectures and show up to 5.5 times improvement over state-of-the-art designs employed by production MPI libraries.","PeriodicalId":403406,"journal":{"name":"2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"102 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":"{\"title\":\"FALCON: Efficient Designs for Zero-Copy MPI Datatype Processing on Emerging Architectures\",\"authors\":\"J. Hashmi, S. Chakraborty, Mohammadreza Bayatpour, H. Subramoni, D. Panda\",\"doi\":\"10.1109/IPDPS.2019.00045\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Derived datatypes are commonly used in MPI applications to exchange non-contiguous data among processes. However, state-of-the-art MPI libraries do not offer efficient processing of derived datatypes and often rely on packing and unpacking the data at the sender and the receiver processes. This approach incurs the cost of extra copies and increases overall communication latency. While zero-copy communication schemes have been proposed for contiguous data, applying such techniques to non-contiguous data transfers bring forth several new challenges. In this work, we address these challenges and propose FALCON — Fast and Low-overhead Communication designs for intra-node MPI derived datatypes processing. We show that the memory layouts translation of derived datatypes introduce significant overheads in the communication path and propose novel solutions to mitigate such bottlenecks. We also find that the current MPI datatype routines cannot fully take advantage of the zero-copy mechanisms, and propose enhancements to the MPI standard to address these limitations. The experimental evaluations show that our proposed designs achieve up to 3 times improved intra-node communication latency and bandwidth over state-of-the-art MPI libraries. We also evaluate our designs with communication kernels of popular scientific applications such as MILC, WRF, NAS MG, and 3D-Stencil on three different multi-/many-core architectures and show up to 5.5 times improvement over state-of-the-art designs employed by production MPI libraries.\",\"PeriodicalId\":403406,\"journal\":{\"name\":\"2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS)\",\"volume\":\"102 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-05-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IPDPS.2019.00045\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPS.2019.00045","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9

摘要

派生数据类型通常在MPI应用程序中用于在进程之间交换不连续的数据。然而,最先进的MPI库不能提供对派生数据类型的有效处理,而且通常依赖于在发送方和接收方进程中对数据进行打包和解包。这种方法会产生额外的副本成本,并增加总体通信延迟。虽然已经提出了用于连续数据的零拷贝通信方案,但将这些技术应用于非连续数据传输带来了几个新的挑战。在这项工作中,我们解决了这些挑战,并提出了用于节点内MPI派生数据类型处理的FALCON - Fast和低开销通信设计。我们展示了派生数据类型的内存布局转换在通信路径中引入了显著的开销,并提出了缓解此类瓶颈的新解决方案。我们还发现当前的MPI数据类型例程不能充分利用零复制机制,并提出了对MPI标准的增强以解决这些限制。实验评估表明,与最先进的MPI库相比,我们提出的设计实现了高达3倍的节点内通信延迟和带宽改进。我们还用流行的科学应用程序的通信内核(如MILC、WRF、NAS MG和3D-Stencil)在三种不同的多核/多核架构上评估了我们的设计,并显示出比生产MPI库采用的最先进设计提高了5.5倍。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
FALCON: Efficient Designs for Zero-Copy MPI Datatype Processing on Emerging Architectures
Derived datatypes are commonly used in MPI applications to exchange non-contiguous data among processes. However, state-of-the-art MPI libraries do not offer efficient processing of derived datatypes and often rely on packing and unpacking the data at the sender and the receiver processes. This approach incurs the cost of extra copies and increases overall communication latency. While zero-copy communication schemes have been proposed for contiguous data, applying such techniques to non-contiguous data transfers bring forth several new challenges. In this work, we address these challenges and propose FALCON — Fast and Low-overhead Communication designs for intra-node MPI derived datatypes processing. We show that the memory layouts translation of derived datatypes introduce significant overheads in the communication path and propose novel solutions to mitigate such bottlenecks. We also find that the current MPI datatype routines cannot fully take advantage of the zero-copy mechanisms, and propose enhancements to the MPI standard to address these limitations. The experimental evaluations show that our proposed designs achieve up to 3 times improved intra-node communication latency and bandwidth over state-of-the-art MPI libraries. We also evaluate our designs with communication kernels of popular scientific applications such as MILC, WRF, NAS MG, and 3D-Stencil on three different multi-/many-core architectures and show up to 5.5 times improvement over state-of-the-art designs employed by production MPI libraries.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Distributed Weighted All Pairs Shortest Paths Through Pipelining SAFIRE: Scalable and Accurate Fault Injection for Parallel Multithreaded Applications Architecting Racetrack Memory Preshift through Pattern-Based Prediction Mechanisms Z-Dedup:A Case for Deduplicating Compressed Contents in Cloud Dual Pattern Compression Using Data-Preprocessing for Large-Scale GPU Architectures
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1