Improving Collective I/O Performance with Machine Learning Supported Auto-tuning

2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2020-05-01 DOI:10.1109/IPDPSW50202.2020.00138

Ayse Bagbaba

{"title":"Improving Collective I/O Performance with Machine Learning Supported Auto-tuning","authors":"Ayse Bagbaba","doi":"10.1109/IPDPSW50202.2020.00138","DOIUrl":null,"url":null,"abstract":"Collective Input and output (I/O) is an essential approach in high performance computing (HPC) applications. The achievement of effective collective I/O is a nontrivial job due to the complex interdependencies between the layers of I/O stack. These layers provide the best possible I/O performance through a number of tunable parameters. Sadly, the correct combination of parameters depends on diverse applications and HPC platforms. When a configuration space gets larger, it becomes difficult for humans to monitor the interactions between the configuration options. Engineers has no time or experience for exploring good configuration parameters for each problem because of long benchmarking phase. In most cases, the default settings are implemented, often leading to poor I/O efficiency. I/O profiling tools can not tell the optimal default setups without too much effort to analyzing the tracing results. In this case, an auto-tuning solution for optimizing collective I/O requests and providing system administrators or engineers the statistic information is strongly required. In this paper, a study of the machine learning supported collective I/O auto-tuning including the architecture and software stack is performed. Random forest regression model is used to develop a performance predictor model that can capture parallel I/O behavior as a function of application and file system characteristics. The modeling approach can provide insights into the metrics that impact I/O performance significantly.","PeriodicalId":398819,"journal":{"name":"2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPSW50202.2020.00138","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

Abstract

Collective Input and output (I/O) is an essential approach in high performance computing (HPC) applications. The achievement of effective collective I/O is a nontrivial job due to the complex interdependencies between the layers of I/O stack. These layers provide the best possible I/O performance through a number of tunable parameters. Sadly, the correct combination of parameters depends on diverse applications and HPC platforms. When a configuration space gets larger, it becomes difficult for humans to monitor the interactions between the configuration options. Engineers has no time or experience for exploring good configuration parameters for each problem because of long benchmarking phase. In most cases, the default settings are implemented, often leading to poor I/O efficiency. I/O profiling tools can not tell the optimal default setups without too much effort to analyzing the tracing results. In this case, an auto-tuning solution for optimizing collective I/O requests and providing system administrators or engineers the statistic information is strongly required. In this paper, a study of the machine learning supported collective I/O auto-tuning including the architecture and software stack is performed. Random forest regression model is used to develop a performance predictor model that can capture parallel I/O behavior as a function of application and file system characteristics. The modeling approach can provide insights into the metrics that impact I/O performance significantly.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

通过机器学习支持的自动调优提高集体I/O性能

集合输入和输出(I/O)是高性能计算(HPC)应用程序中必不可少的方法。由于I/O堆栈层之间复杂的相互依赖关系，实现有效的集体I/O是一项艰巨的任务。这些层通过许多可调参数提供最佳的I/O性能。遗憾的是，参数的正确组合取决于不同的应用程序和HPC平台。当配置空间变大时，人们就很难监控配置选项之间的交互。由于长时间的基准测试阶段，工程师没有时间或经验为每个问题探索良好的配置参数。在大多数情况下，会实现默认设置，这通常会导致较差的I/O效率。如果不花太多精力分析跟踪结果，I/O分析工具无法告诉您最佳的默认设置。在这种情况下，迫切需要一种自动调优解决方案，用于优化集体I/O请求并向系统管理员或工程师提供统计信息。本文对机器学习支持的集体I/O自动调优进行了研究，包括体系结构和软件堆栈。随机森林回归模型用于开发性能预测器模型，该模型可以捕获作为应用程序和文件系统特征函数的并行I/O行为。建模方法可以深入了解显著影响I/O性能的指标。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

自引率

0.00%

发文量

期刊最新文献

PDCunplugged: A Free Repository of Unplugged Parallel Distributed Computing Activities Competitive Evolution of a UAV Swarm for Improving Intruder Detection Rates Workshop 7: HPBDC High-Performance Big Data and Cloud Computing Teaching Cloud Computing: Motivations, Challenges and Tools Exploring Chapel Productivity Using Some Graph Algorithms