Probabilistic scheduling of dynamic I/O requests via application clustering for burst-buffers equipped high-performance computing

IF 1.5 4区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Concurrency and Computation-Practice & Experience Pub Date : 2024-06-27 DOI:10.1002/cpe.8142

Benbo Zha, Hong Shen

{"title":"Probabilistic scheduling of dynamic I/O requests via application clustering for burst-buffers equipped high-performance computing","authors":"Benbo Zha, Hong Shen","doi":"10.1002/cpe.8142","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>Burst-buffering is a promising storage solution that introduces an intermediate high-throughput storage buffer layer to mitigate the I/O bottleneck problem that the current high-performance computing (HPC) platforms suffer. The existing Markov-Chain based probabilistic I/O scheduling utilizes the load state of burst-buffers and the periodic characteristics of applications to reduce I/O congestion due to the limited capacity of burst-buffers. However, this probabilistic approach requires consistent I/O characteristics of applications, including similar I/O duration and long application length, in order to obtain an accurate I/O load estimation. These consistency conditions do not often hold in realistic situations. In this paper, we propose a generic framework of dynamic probabilistic I/O scheduling based on application clustering (DPSAC) to make applications meet the consistency requirements. According to the I/O phase length of each application, our scheme first deploys a one-dimensional K-means clustering algorithm to cluster the applications into clusters. Next, it calculates the expected workload of each cluster through the probabilistic model of applications and then partitions the burst-buffers proportionally. Then, to handle dynamic changes (join and exit) of applications, it updates the clusters based on a heuristic strategy. Finally, it applies the probabilistic I/O scheduling, which is based on the distribution of application workload and the state of burst-buffers, to schedule I/O for all the concurrent applications to mitigate I/O congestion. The simulation results on synthetic data show that our DPSAC is effective and efficient.</p>\n </div>","PeriodicalId":55214,"journal":{"name":"Concurrency and Computation-Practice & Experience","volume":"36 19","pages":""},"PeriodicalIF":1.5000,"publicationDate":"2024-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Concurrency and Computation-Practice & Experience","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/cpe.8142","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

Burst-buffering is a promising storage solution that introduces an intermediate high-throughput storage buffer layer to mitigate the I/O bottleneck problem that the current high-performance computing (HPC) platforms suffer. The existing Markov-Chain based probabilistic I/O scheduling utilizes the load state of burst-buffers and the periodic characteristics of applications to reduce I/O congestion due to the limited capacity of burst-buffers. However, this probabilistic approach requires consistent I/O characteristics of applications, including similar I/O duration and long application length, in order to obtain an accurate I/O load estimation. These consistency conditions do not often hold in realistic situations. In this paper, we propose a generic framework of dynamic probabilistic I/O scheduling based on application clustering (DPSAC) to make applications meet the consistency requirements. According to the I/O phase length of each application, our scheme first deploys a one-dimensional K-means clustering algorithm to cluster the applications into clusters. Next, it calculates the expected workload of each cluster through the probabilistic model of applications and then partitions the burst-buffers proportionally. Then, to handle dynamic changes (join and exit) of applications, it updates the clusters based on a heuristic strategy. Finally, it applies the probabilistic I/O scheduling, which is based on the distribution of application workload and the state of burst-buffers, to schedule I/O for all the concurrent applications to mitigate I/O congestion. The simulation results on synthetic data show that our DPSAC is effective and efficient.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

通过应用集群对配备突发缓冲区的高性能计算的动态 I/O 请求进行概率调度

摘要突发缓冲是一种前景广阔的存储解决方案，它引入了一个中间高吞吐量存储缓冲层，以缓解当前高性能计算（HPC）平台所面临的 I/O 瓶颈问题。现有的基于马尔可夫链的概率 I/O 调度利用突发缓冲区的负载状态和应用程序的周期性特征来减少突发缓冲区容量有限造成的 I/O 拥塞。然而，这种概率方法需要应用程序具有一致的 I/O 特性，包括相似的 I/O 持续时间和较长的应用程序长度，才能获得准确的 I/O 负载估计。在现实情况中，这些一致性条件往往不成立。本文提出了一种基于应用聚类的动态概率 I/O 调度（DPSAC）通用框架，使应用满足一致性要求。根据每个应用程序的 I/O 阶段长度，我们的方案首先部署一维 K-means 聚类算法，将应用程序聚类成群。接着，它通过应用的概率模型计算每个群组的预期工作量，然后按比例划分突发缓冲区。然后，为了处理应用程序的动态变化（加入和退出），它会根据启发式策略更新群集。最后，根据应用工作量的分布和突发缓冲区的状态，应用概率 I/O 调度，为所有并发应用调度 I/O，以缓解 I/O 拥塞。对合成数据的仿真结果表明，我们的 DPSAC 是有效和高效的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Concurrency and Computation-Practice & Experience 工程技术-计算机：理论方法

CiteScore

5.00

自引率

10.00%

发文量

664

审稿时长

9.6 months

期刊介绍： Concurrency and Computation: Practice and Experience (CCPE) publishes high-quality, original research papers, and authoritative research review papers, in the overlapping fields of: Parallel and distributed computing; High-performance computing; Computational and data science; Artificial intelligence and machine learning; Big data applications, algorithms, and systems; Network science; Ontologies and semantics; Security and privacy; Cloud/edge/fog computing; Green computing; and Quantum computing.