Taming parallel I/O complexity with auto-tuning

2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC) Pub Date : 2013-11-17 DOI:10.1145/2503210.2503278

Babak Behzad, Huong Vu, Thanh Luu, Joseph Huchette, S. Byna, R. Aydt, Q. Koziol, M. Snir

{"title":"Taming parallel I/O complexity with auto-tuning","authors":"Babak Behzad, Huong Vu, Thanh Luu, Joseph Huchette, S. Byna, R. Aydt, Q. Koziol, M. Snir","doi":"10.1145/2503210.2503278","DOIUrl":null,"url":null,"abstract":"We present an auto-tuning system for optimizing I/O performance of HDF5 applications and demonstrate its value across platforms, applications, and at scale. The system uses a genetic algorithm to search a large space of tunable parameters and to identify effective settings at all layers of the parallel I/O stack. The parameter settings are applied transparently by the auto-tuning system via dynamically intercepted HDF5 calls. To validate our auto-tuning system, we applied it to three I/O benchmarks (VPIC, VORPAL, and GCRM) that replicate the I/O activity of their respective applications. We tested the system with different weak-scaling configurations (128, 2048, and 4096 CPU cores) that generate 30 GB to 1 TB of data, and executed these configurations on diverse HPC platforms (Cray XE6, IBM BG/P, and Dell Cluster). In all cases, the auto-tuning framework identified tunable parameters that substantially improved write performance over default system settings. We consistently demonstrate I/O write speedups between 2× and 100× for test configurations.","PeriodicalId":371074,"journal":{"name":"2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"114","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2503210.2503278","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 114

Abstract

We present an auto-tuning system for optimizing I/O performance of HDF5 applications and demonstrate its value across platforms, applications, and at scale. The system uses a genetic algorithm to search a large space of tunable parameters and to identify effective settings at all layers of the parallel I/O stack. The parameter settings are applied transparently by the auto-tuning system via dynamically intercepted HDF5 calls. To validate our auto-tuning system, we applied it to three I/O benchmarks (VPIC, VORPAL, and GCRM) that replicate the I/O activity of their respective applications. We tested the system with different weak-scaling configurations (128, 2048, and 4096 CPU cores) that generate 30 GB to 1 TB of data, and executed these configurations on diverse HPC platforms (Cray XE6, IBM BG/P, and Dell Cluster). In all cases, the auto-tuning framework identified tunable parameters that substantially improved write performance over default system settings. We consistently demonstrate I/O write speedups between 2× and 100× for test configurations.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

通过自动调优控制并行I/O复杂性

我们提出了一个用于优化HDF5应用程序I/O性能的自动调优系统，并展示了它在跨平台、应用程序和规模上的价值。该系统使用遗传算法搜索大量可调参数，并识别并行I/O堆栈各层的有效设置。参数设置通过动态截获的HDF5调用由自动调优系统透明地应用。为了验证我们的自动调优系统，我们将其应用于三个I/O基准测试(VPIC、VORPAL和GCRM)，它们复制各自应用程序的I/O活动。我们使用不同的弱伸缩配置(128、2048和4096个CPU内核)测试了系统，这些配置可以生成30 GB到1 TB的数据，并在不同的HPC平台(Cray XE6、IBM BG/P和Dell Cluster)上执行这些配置。在所有情况下，自动调优框架都确定了可调参数，这些参数大大提高了默认系统设置的写性能。对于测试配置，我们始终证明I/O写入速度在2倍到100倍之间。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC)

自引率

0.00%

发文量