RuYi: Optimizing Burst Buffer Through Automated, Fine-Grained Process-to-BB Mapping

IF 3.8 2区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE IEEE Transactions on Computers Pub Date : 2024-12-02 DOI:10.1109/TC.2024.3510624
Yusheng Hua;Xuanhua Shi;Ligang He;Kang He;Teng Zhang;Hai Jin;Yong Chen
{"title":"RuYi: Optimizing Burst Buffer Through Automated, Fine-Grained Process-to-BB Mapping","authors":"Yusheng Hua;Xuanhua Shi;Ligang He;Kang He;Teng Zhang;Hai Jin;Yong Chen","doi":"10.1109/TC.2024.3510624","DOIUrl":null,"url":null,"abstract":"Current supercomputers use an SSD-based storage layer called Burst Buffer (BB) to provide I/O-intensive applications with accelerated storage access. However, efficiently utilizing this limited and expensive storage remains a critical issue, creating an urgent need for implementing Quality of Service (QoS) in BB. To address this, we propose RuYi, a QoS-aware method to provide applications with bandwidth guarantees in the BB file system. RuYi tackles two main issues. First, it quantitatively profiles available bandwidth resources in BB to ensure reliable QoS, a crucial aspect seldom studied in the literature. Second, RuYi offers fine-grained process-level QoS via an innovative process-to-BB mapping, maximizing resource utilization—something not achievable with conventional coarse-grained compute-to-BB mapping. We evaluated RuYi on a subsystem of the leading exascale supercomputer Sunway, consisting of 4,000 compute nodes and 200 BB nodes. The experimental results demonstrate that RuYi achieves an impressive end-to-end bandwidth control accuracy of 97%, while improving BB utilization by up to 116% compared to conventional coarse-grained compute-to-BB mapping.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 3","pages":"955-967"},"PeriodicalIF":3.8000,"publicationDate":"2024-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10772616","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Computers","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10772616/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0

Abstract

Current supercomputers use an SSD-based storage layer called Burst Buffer (BB) to provide I/O-intensive applications with accelerated storage access. However, efficiently utilizing this limited and expensive storage remains a critical issue, creating an urgent need for implementing Quality of Service (QoS) in BB. To address this, we propose RuYi, a QoS-aware method to provide applications with bandwidth guarantees in the BB file system. RuYi tackles two main issues. First, it quantitatively profiles available bandwidth resources in BB to ensure reliable QoS, a crucial aspect seldom studied in the literature. Second, RuYi offers fine-grained process-level QoS via an innovative process-to-BB mapping, maximizing resource utilization—something not achievable with conventional coarse-grained compute-to-BB mapping. We evaluated RuYi on a subsystem of the leading exascale supercomputer Sunway, consisting of 4,000 compute nodes and 200 BB nodes. The experimental results demonstrate that RuYi achieves an impressive end-to-end bandwidth control accuracy of 97%, while improving BB utilization by up to 116% compared to conventional coarse-grained compute-to-BB mapping.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
如意:通过自动化、细粒度流程到bb映射优化突发缓冲
当前的超级计算机使用基于ssd的存储层,称为突发缓冲区(BB),为I/ o密集型应用程序提供加速存储访问。然而,有效利用这种有限且昂贵的存储仍然是一个关键问题,因此迫切需要在BB中实现服务质量(QoS)。为了解决这个问题,我们提出了一种qos感知方法——如意,它可以在BB文件系统中为应用程序提供带宽保证。《如意》解决了两个主要问题。首先,它定量地描述了BB中的可用带宽资源,以确保可靠的QoS,这是一个在文献中很少研究的关键方面。其次,如意通过创新的流程到bb映射提供了细粒度的流程级QoS,最大限度地提高了资源利用率,这是传统的粗粒度计算到bb映射无法实现的。我们在领先的百亿亿次超级计算机神威的一个子系统上对如意进行了评估,该子系统由4000个计算节点和200个BB节点组成。实验结果表明,与传统的粗粒度计算到BB映射相比,如意实现了令人印象深刻的97%的端到端带宽控制精度,同时提高了高达116%的BB利用率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
IEEE Transactions on Computers
IEEE Transactions on Computers 工程技术-工程:电子与电气
CiteScore
6.60
自引率
5.40%
发文量
199
审稿时长
6.0 months
期刊介绍: The IEEE Transactions on Computers is a monthly publication with a wide distribution to researchers, developers, technical managers, and educators in the computer field. It publishes papers on research in areas of current interest to the readers. These areas include, but are not limited to, the following: a) computer organizations and architectures; b) operating systems, software systems, and communication protocols; c) real-time systems and embedded systems; d) digital devices, computer components, and interconnection networks; e) specification, design, prototyping, and testing methods and tools; f) performance, fault tolerance, reliability, security, and testability; g) case studies and experimental and theoretical evaluations; and h) new and important applications and trends.
期刊最新文献
GRASP: Accelerating Hash-Based PQC Performance on GPU Parallel Architecture FlexClave: An Extensible and Secure Trusted Execution Environment Framework Collaborative Prediction of Cloud DRAM Failures With Rules and Machine Learning Hardware-Efficient Taylor Series-Based Optimal Unsigned Square Rooter for Fast and Low Power Computation MalPDT: Backdoor Attack Against Static Malware Detection With Plug-and-Play Dynamic Triggers
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1