Speeding up the communications on a cluster using MPI by means of Software Defined Networks

IF 6.2 2区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Future Generation Computer Systems-The International Journal of Escience Pub Date : 2024-07-31 DOI:10.1016/j.future.2024.07.047

{"title":"Speeding up the communications on a cluster using MPI by means of Software Defined Networks","authors":"","doi":"10.1016/j.future.2024.07.047","DOIUrl":null,"url":null,"abstract":"<div><p>The Open MPI library is widely employed for implementing the message-passing programming model on parallel applications running on distributed memory computer systems, such as large data centers. These applications aim to utilize the highest amount of resources required by High Performance Computing (HPC). The interconnection network is an essential part of the HPC environment, as processes on parallel applications are constantly communicating and sharing data. Software Defined Networking (SDN) is a different networking approach that separates the control plane from the data forwarding plane, which can be configured depending on the network status or specific requirements of parallel application communications. Given that the communication time significantly contributes to the overall execution time of a parallel program and considering the elapsed time during Open MPI initialization of TCP connections between processes in Ethernet networks, this paper proposes the integration of a software defined networking environment into the Open MPI library. The primary objective of our contribution is to provide the network controller with information about Open MPI processes, in order to configure the network during the initialization procedure of the Open MPI library. This may facilitate the development of SDN-based routing techniques that reduce communication times, and thus execution times, using application information, such as the Open MPI endpoints participating in a parallel program execution. To demonstrate the utility of the information provided by Open MPI processes, we have implemented a routing algorithm that will calculate the optimal paths between processes based on the weighted Dijkstra algorithm, using the number of flows traversing the topology links. The evaluation of the proposed mechanism utilizing a 2-stage fat tree topology and two parallel applications - a matrix product and the Model for Prediction Across Scales (MPAS) - showed significant improvements in execution time, with reductions of up to 2.5 times for a 4096 × 4096 matrix product and 1.3 times for an 8192 × 8192 matrix product, as well as a 1.5 times reduction for MPAS in the worst network occupancy scenario. This demonstrates the improvements in communication and therefore execution time.</p></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":null,"pages":null},"PeriodicalIF":6.2000,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0167739X24004217/pdfft?md5=879ad982dcda72cf4341e57ad5bcfe85&pid=1-s2.0-S0167739X24004217-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Future Generation Computer Systems-The International Journal of Escience","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167739X24004217","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

The Open MPI library is widely employed for implementing the message-passing programming model on parallel applications running on distributed memory computer systems, such as large data centers. These applications aim to utilize the highest amount of resources required by High Performance Computing (HPC). The interconnection network is an essential part of the HPC environment, as processes on parallel applications are constantly communicating and sharing data. Software Defined Networking (SDN) is a different networking approach that separates the control plane from the data forwarding plane, which can be configured depending on the network status or specific requirements of parallel application communications. Given that the communication time significantly contributes to the overall execution time of a parallel program and considering the elapsed time during Open MPI initialization of TCP connections between processes in Ethernet networks, this paper proposes the integration of a software defined networking environment into the Open MPI library. The primary objective of our contribution is to provide the network controller with information about Open MPI processes, in order to configure the network during the initialization procedure of the Open MPI library. This may facilitate the development of SDN-based routing techniques that reduce communication times, and thus execution times, using application information, such as the Open MPI endpoints participating in a parallel program execution. To demonstrate the utility of the information provided by Open MPI processes, we have implemented a routing algorithm that will calculate the optimal paths between processes based on the weighted Dijkstra algorithm, using the number of flows traversing the topology links. The evaluation of the proposed mechanism utilizing a 2-stage fat tree topology and two parallel applications - a matrix product and the Model for Prediction Across Scales (MPAS) - showed significant improvements in execution time, with reductions of up to 2.5 times for a 4096 × 4096 matrix product and 1.3 times for an 8192 × 8192 matrix product, as well as a 1.5 times reduction for MPAS in the worst network occupancy scenario. This demonstrates the improvements in communication and therefore execution time.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

通过软件定义网络使用 MPI 加速集群通信

开放式 MPI 库被广泛用于在分布式内存计算机系统（如大型数据中心）上运行的并行应用程序中实施消息传递编程模型。这些应用程序旨在利用高性能计算（HPC）所需的最高资源量。互联网络是高性能计算环境的重要组成部分，因为并行应用程序的进程会不断进行通信和共享数据。软件定义网络（SDN）是一种不同的网络方法，它将控制平面与数据转发平面分开，可根据网络状态或并行应用通信的具体要求进行配置。鉴于通信时间对并行程序的整体执行时间有很大影响，并考虑到以太网网络中进程间 TCP 连接的 Open MPI 初始化过程所耗费的时间，本文提出将软件定义网络环境集成到 Open MPI 库中。我们的主要目标是为网络控制器提供有关 Open MPI 进程的信息，以便在 Open MPI 库的初始化过程中配置网络。这将有助于开发基于 SDN 的路由技术，利用应用程序信息（如参与并行程序执行的 Open MPI 端点）减少通信时间，从而缩短执行时间。为了证明 Open MPI 进程提供的信息的实用性，我们实施了一种路由算法，该算法将根据加权 Dijkstra 算法，利用流经拓扑链接的数量计算进程之间的最佳路径。利用 2 级胖树拓扑和两个并行应用程序（矩阵乘积和跨尺度预测模型（MPAS））对所提议的机制进行的评估显示，执行时间有了显著改善，4096 × 4096 矩阵乘积缩短了 2.5 倍，8192 × 8192 矩阵乘积缩短了 1.3 倍，MPAS 在最差网络占用情况下缩短了 1.5 倍。这显示了通信方面的改进，从而缩短了执行时间。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Future Generation Computer Systems-The International Journal of Escience 工程技术-计算机：理论方法

CiteScore

19.90

自引率

2.70%

发文量

376

审稿时长

10.6 months

期刊介绍： Computing infrastructures and systems are constantly evolving, resulting in increasingly complex and collaborative scientific applications. To cope with these advancements, there is a growing need for collaborative tools that can effectively map, control, and execute these applications. Furthermore, with the explosion of Big Data, there is a requirement for innovative methods and infrastructures to collect, analyze, and derive meaningful insights from the vast amount of data generated. This necessitates the integration of computational and storage capabilities, databases, sensors, and human collaboration. Future Generation Computer Systems aims to pioneer advancements in distributed systems, collaborative environments, high-performance computing, and Big Data analytics. It strives to stay at the forefront of developments in grids, clouds, and the Internet of Things (IoT) to effectively address the challenges posed by these wide-area, fully distributed sensing and computing systems.