大规模并行文件系统的资源竞争感知负载平衡

Bharti Wadhwa, A. Paul, Sarah Neuwirth, Feiyi Wang, S. Oral, A. Butt, Jon Bernard, K. Cameron
{"title":"大规模并行文件系统的资源竞争感知负载平衡","authors":"Bharti Wadhwa, A. Paul, Sarah Neuwirth, Feiyi Wang, S. Oral, A. Butt, Jon Bernard, K. Cameron","doi":"10.1109/IPDPS.2019.00070","DOIUrl":null,"url":null,"abstract":"Parallel I/O performance is crucial to sustaining scientific applications on large-scale High-Performance Computing (HPC) systems. However, I/O load imbalance in the underlying distributed and shared storage systems can significantly reduce overall application performance. There are two conflicting challenges to mitigate this load imbalance: (i) optimizing systemwide data placement to maximize the bandwidth advantages of distributed storage servers, i.e., allocating I/O resources efficiently across applications and job runs; and (ii) optimizing client-centric data movement to minimize I/O load request latency between clients and servers, i.e., allocating I/O resources efficiently in service to a single application and job run. Moreover, existing approaches that require application changes limit wide-spread adoption in commercial or proprietary deployments. We propose iez, an \"end-to-end control plane\" where clients transparently and adaptively write to a set of selected I/O servers to achieve balanced data placement. Our control plane leverages realtime load information for distributed storage server global data placement while our design model leverages trace-based optimization techniques to minimize I/O load request latency between clients and servers. We evaluate our proposed system on an experimental cluster for two common use cases: synthetic I/O benchmark IOR for large sequential writes and a scientific application I/O kernel, HACC-I/O. Results show read and write performance improvements of up to 34% and 32%, respectively, compared to the state of the art.","PeriodicalId":403406,"journal":{"name":"2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"iez: Resource Contention Aware Load Balancing for Large-Scale Parallel File Systems\",\"authors\":\"Bharti Wadhwa, A. Paul, Sarah Neuwirth, Feiyi Wang, S. Oral, A. Butt, Jon Bernard, K. Cameron\",\"doi\":\"10.1109/IPDPS.2019.00070\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Parallel I/O performance is crucial to sustaining scientific applications on large-scale High-Performance Computing (HPC) systems. However, I/O load imbalance in the underlying distributed and shared storage systems can significantly reduce overall application performance. There are two conflicting challenges to mitigate this load imbalance: (i) optimizing systemwide data placement to maximize the bandwidth advantages of distributed storage servers, i.e., allocating I/O resources efficiently across applications and job runs; and (ii) optimizing client-centric data movement to minimize I/O load request latency between clients and servers, i.e., allocating I/O resources efficiently in service to a single application and job run. Moreover, existing approaches that require application changes limit wide-spread adoption in commercial or proprietary deployments. We propose iez, an \\\"end-to-end control plane\\\" where clients transparently and adaptively write to a set of selected I/O servers to achieve balanced data placement. Our control plane leverages realtime load information for distributed storage server global data placement while our design model leverages trace-based optimization techniques to minimize I/O load request latency between clients and servers. We evaluate our proposed system on an experimental cluster for two common use cases: synthetic I/O benchmark IOR for large sequential writes and a scientific application I/O kernel, HACC-I/O. Results show read and write performance improvements of up to 34% and 32%, respectively, compared to the state of the art.\",\"PeriodicalId\":403406,\"journal\":{\"name\":\"2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS)\",\"volume\":\"56 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IPDPS.2019.00070\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPS.2019.00070","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8

摘要

并行I/O性能对于维持大规模高性能计算(HPC)系统上的科学应用程序至关重要。但是,底层分布式和共享存储系统中的I/O负载不平衡会显著降低应用程序的整体性能。缓解这种负载不平衡存在两个相互冲突的挑战:(i)优化系统范围的数据放置,以最大限度地利用分布式存储服务器的带宽优势,即在应用程序和作业运行之间有效地分配i /O资源;(ii)优化以客户端为中心的数据移动,以最小化客户端和服务器之间的I/O负载请求延迟,即在服务中为单个应用程序和作业运行有效地分配I/O资源。此外,需要更改应用程序的现有方法限制了在商业或专有部署中的广泛采用。我们提出了iez,一个“端到端控制平面”,客户端可以透明且自适应地写入一组选定的I/O服务器,以实现平衡的数据放置。我们的控制平面利用了分布式存储服务器全局数据放置的实时负载信息,而我们的设计模型利用了基于跟踪的优化技术来最小化客户端和服务器之间的I/O负载请求延迟。我们在两个常见用例的实验集群上评估了我们提出的系统:用于大顺序写入的合成I/O基准IOR和科学应用I/O内核HACC-I/O。结果显示,与现有技术相比,读写性能分别提高了34%和32%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
iez: Resource Contention Aware Load Balancing for Large-Scale Parallel File Systems
Parallel I/O performance is crucial to sustaining scientific applications on large-scale High-Performance Computing (HPC) systems. However, I/O load imbalance in the underlying distributed and shared storage systems can significantly reduce overall application performance. There are two conflicting challenges to mitigate this load imbalance: (i) optimizing systemwide data placement to maximize the bandwidth advantages of distributed storage servers, i.e., allocating I/O resources efficiently across applications and job runs; and (ii) optimizing client-centric data movement to minimize I/O load request latency between clients and servers, i.e., allocating I/O resources efficiently in service to a single application and job run. Moreover, existing approaches that require application changes limit wide-spread adoption in commercial or proprietary deployments. We propose iez, an "end-to-end control plane" where clients transparently and adaptively write to a set of selected I/O servers to achieve balanced data placement. Our control plane leverages realtime load information for distributed storage server global data placement while our design model leverages trace-based optimization techniques to minimize I/O load request latency between clients and servers. We evaluate our proposed system on an experimental cluster for two common use cases: synthetic I/O benchmark IOR for large sequential writes and a scientific application I/O kernel, HACC-I/O. Results show read and write performance improvements of up to 34% and 32%, respectively, compared to the state of the art.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Distributed Weighted All Pairs Shortest Paths Through Pipelining SAFIRE: Scalable and Accurate Fault Injection for Parallel Multithreaded Applications Architecting Racetrack Memory Preshift through Pattern-Based Prediction Mechanisms Z-Dedup:A Case for Deduplicating Compressed Contents in Cloud Dual Pattern Compression Using Data-Preprocessing for Large-Scale GPU Architectures
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1