Blink: Towards Efficient RDMA-based Communication Coroutines for Parallel Python Applications

A. Shafi, J. Hashmi, H. Subramoni, D. Panda
{"title":"Blink: Towards Efficient RDMA-based Communication Coroutines for Parallel Python Applications","authors":"A. Shafi, J. Hashmi, H. Subramoni, D. Panda","doi":"10.1109/HiPC50609.2020.00025","DOIUrl":null,"url":null,"abstract":"Python is emerging as a popular language in the data science community due to its ease-of-use, vibrant community, and rich set of libraries. Dask is a popular Python-based distributed computing framework that allows users to process large amounts of data on parallel hardware. The Dask distributed package is a non-blocking, asynchronous, and concurrent library that offers support for distributed execution of tasks on datacenter and HPC environments. A few key requirements of designing high-performance communication backends for Dask distributed is to provide scalable support for coroutines that are unlike regular Python functions and can only be invoked from asynchronous applications. In this paper, we present Blink—a high-performance communication library for Dask on high-performance RDMA networks like InfiniBand. Blink offers a multi-layered architecture that matches the communication requirements of Dask and exploits high-performance interconnects using a Cython wrapper layer to the C backend. We evaluate the performance of Blink against other counterparts using various micro-benchmarks and application kernels on three different cluster testbeds with varying interconnect speeds. Our micro-benchmark evaluation reveals that Blink outperforms other communication backends by more than 3× for message sizes ranging from 1 Byte to 64 KByte, and by a factor of 2× for message sizes ranging from 128 KByte to 8 MByte. Using various application-level evaluations, we demonstrate that Dask achieves up to 7% improvement in application throughput (e.g., total worker throughput).","PeriodicalId":375004,"journal":{"name":"2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HiPC50609.2020.00025","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Python is emerging as a popular language in the data science community due to its ease-of-use, vibrant community, and rich set of libraries. Dask is a popular Python-based distributed computing framework that allows users to process large amounts of data on parallel hardware. The Dask distributed package is a non-blocking, asynchronous, and concurrent library that offers support for distributed execution of tasks on datacenter and HPC environments. A few key requirements of designing high-performance communication backends for Dask distributed is to provide scalable support for coroutines that are unlike regular Python functions and can only be invoked from asynchronous applications. In this paper, we present Blink—a high-performance communication library for Dask on high-performance RDMA networks like InfiniBand. Blink offers a multi-layered architecture that matches the communication requirements of Dask and exploits high-performance interconnects using a Cython wrapper layer to the C backend. We evaluate the performance of Blink against other counterparts using various micro-benchmarks and application kernels on three different cluster testbeds with varying interconnect speeds. Our micro-benchmark evaluation reveals that Blink outperforms other communication backends by more than 3× for message sizes ranging from 1 Byte to 64 KByte, and by a factor of 2× for message sizes ranging from 128 KByte to 8 MByte. Using various application-level evaluations, we demonstrate that Dask achieves up to 7% improvement in application throughput (e.g., total worker throughput).
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
面向并行Python应用的基于rdma的高效通信协程
由于其易用性、充满活力的社区和丰富的库集,Python正在成为数据科学社区中的流行语言。Dask是一种流行的基于python的分布式计算框架,它允许用户在并行硬件上处理大量数据。Dask分布式包是一个非阻塞、异步和并发的库,它支持在数据中心和HPC环境中分布式执行任务。为分布式Dask设计高性能通信后端的几个关键要求是为协程提供可扩展的支持,这些协程与常规Python函数不同,只能从异步应用程序调用。在本文中,我们提出了blink -一个高性能通信库,用于在高性能RDMA网络(如InfiniBand)上的Dask。Blink提供了一个多层体系结构,它符合Dask的通信需求,并利用使用Cython包装层到C后端的高性能互连。我们在三个不同的集群测试平台上使用不同的微基准测试和应用程序内核,以不同的互连速度对Blink的性能进行了评估。我们的微基准评估显示,在消息大小从1字节到64 KByte的情况下,Blink的性能比其他通信后端高出3倍以上,在消息大小从128 KByte到8 MByte的情况下,Blink的性能高出2倍。使用各种应用程序级别的评估,我们证明了Dask在应用程序吞吐量(例如,总工作人员吞吐量)方面实现了高达7%的改进。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
HiPC 2020 ORGANIZATION HiPC 2020 Industry Sponsors PufferFish: NUMA-Aware Work-stealing Library using Elastic Tasks Algorithms for Preemptive Co-scheduling of Kernels on GPUs 27th IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC 2020) Technical program
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1