Blink: Towards Efficient RDMA-based Communication Coroutines for Parallel Python Applications

2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC) Pub Date : 2020-12-01 DOI:10.1109/HiPC50609.2020.00025

A. Shafi, J. Hashmi, H. Subramoni, D. Panda

{"title":"Blink: Towards Efficient RDMA-based Communication Coroutines for Parallel Python Applications","authors":"A. Shafi, J. Hashmi, H. Subramoni, D. Panda","doi":"10.1109/HiPC50609.2020.00025","DOIUrl":null,"url":null,"abstract":"Python is emerging as a popular language in the data science community due to its ease-of-use, vibrant community, and rich set of libraries. Dask is a popular Python-based distributed computing framework that allows users to process large amounts of data on parallel hardware. The Dask distributed package is a non-blocking, asynchronous, and concurrent library that offers support for distributed execution of tasks on datacenter and HPC environments. A few key requirements of designing high-performance communication backends for Dask distributed is to provide scalable support for coroutines that are unlike regular Python functions and can only be invoked from asynchronous applications. In this paper, we present Blink—a high-performance communication library for Dask on high-performance RDMA networks like InfiniBand. Blink offers a multi-layered architecture that matches the communication requirements of Dask and exploits high-performance interconnects using a Cython wrapper layer to the C backend. We evaluate the performance of Blink against other counterparts using various micro-benchmarks and application kernels on three different cluster testbeds with varying interconnect speeds. Our micro-benchmark evaluation reveals that Blink outperforms other communication backends by more than 3× for message sizes ranging from 1 Byte to 64 KByte, and by a factor of 2× for message sizes ranging from 128 KByte to 8 MByte. Using various application-level evaluations, we demonstrate that Dask achieves up to 7% improvement in application throughput (e.g., total worker throughput).","PeriodicalId":375004,"journal":{"name":"2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HiPC50609.2020.00025","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Python is emerging as a popular language in the data science community due to its ease-of-use, vibrant community, and rich set of libraries. Dask is a popular Python-based distributed computing framework that allows users to process large amounts of data on parallel hardware. The Dask distributed package is a non-blocking, asynchronous, and concurrent library that offers support for distributed execution of tasks on datacenter and HPC environments. A few key requirements of designing high-performance communication backends for Dask distributed is to provide scalable support for coroutines that are unlike regular Python functions and can only be invoked from asynchronous applications. In this paper, we present Blink—a high-performance communication library for Dask on high-performance RDMA networks like InfiniBand. Blink offers a multi-layered architecture that matches the communication requirements of Dask and exploits high-performance interconnects using a Cython wrapper layer to the C backend. We evaluate the performance of Blink against other counterparts using various micro-benchmarks and application kernels on three different cluster testbeds with varying interconnect speeds. Our micro-benchmark evaluation reveals that Blink outperforms other communication backends by more than 3× for message sizes ranging from 1 Byte to 64 KByte, and by a factor of 2× for message sizes ranging from 128 KByte to 8 MByte. Using various application-level evaluations, we demonstrate that Dask achieves up to 7% improvement in application throughput (e.g., total worker throughput).

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

面向并行Python应用的基于rdma的高效通信协程

由于其易用性、充满活力的社区和丰富的库集，Python正在成为数据科学社区中的流行语言。Dask是一种流行的基于python的分布式计算框架，它允许用户在并行硬件上处理大量数据。Dask分布式包是一个非阻塞、异步和并发的库，它支持在数据中心和HPC环境中分布式执行任务。为分布式Dask设计高性能通信后端的几个关键要求是为协程提供可扩展的支持，这些协程与常规Python函数不同，只能从异步应用程序调用。在本文中，我们提出了blink -一个高性能通信库，用于在高性能RDMA网络(如InfiniBand)上的Dask。Blink提供了一个多层体系结构，它符合Dask的通信需求，并利用使用Cython包装层到C后端的高性能互连。我们在三个不同的集群测试平台上使用不同的微基准测试和应用程序内核，以不同的互连速度对Blink的性能进行了评估。我们的微基准评估显示，在消息大小从1字节到64 KByte的情况下，Blink的性能比其他通信后端高出3倍以上，在消息大小从128 KByte到8 MByte的情况下，Blink的性能高出2倍。使用各种应用程序级别的评估，我们证明了Dask在应用程序吞吐量(例如，总工作人员吞吐量)方面实现了高达7%的改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC)

自引率

0.00%

发文量

期刊最新文献

HiPC 2020 ORGANIZATION HiPC 2020 Industry Sponsors PufferFish: NUMA-Aware Work-stealing Library using Elastic Tasks Algorithms for Preemptive Co-scheduling of Kernels on GPUs 27th IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC 2020) Technical program