利用小规模历史数据预测大规模高性能计算应用性能

2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2020-05-01 DOI:10.1109/IPDPSW50202.2020.00135

Wenju Zhou, Jiepeng Zhang, Jingwei Sun, Guangzhong Sun

{"title":"利用小规模历史数据预测大规模高性能计算应用性能","authors":"Wenju Zhou, Jiepeng Zhang, Jingwei Sun, Guangzhong Sun","doi":"10.1109/IPDPSW50202.2020.00135","DOIUrl":null,"url":null,"abstract":"Performance modeling is an important problem in high-performance computing (HPC). Machine Learning (ML) is a powerful approach for HPC performance modeling. ML can learn complex relations between application parameters and the performance of HPC applications from historical execution data. However, extrapolation of large-scale performance with only small-scale execution data using ML is difficult, because the independent and identically distributed hypothesis (the basic hypothesis of most ML algorithms) does not hold in this situation. To solve the extrapolation problem, we propose a two-level model consisting of interpolation level and extrapolation level. The interpolation level predicts small-scale performance with small-scale execution. The extrapolation level predicts the large-scale performance of the fixed input parameter with its small-scale performance predictions. We use the random forest to build interpolation models to predict small-scale performance in the interpolation level. In the extrapolation level, to reduce the negative influence of interpolation errors, we employ the multitask lasso with clustering to construct the scalability models to predict large-scale performance. To validate the utility of our two-level model, we conduct experiments on a real HPC platform. We build models for two HPC applications using our two-level model. Compare with existing ML methods, our method can achieve higher prediction accuracy.","PeriodicalId":398819,"journal":{"name":"2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"118 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Using Small-Scale History Data to Predict Large-Scale Performance of HPC Application\",\"authors\":\"Wenju Zhou, Jiepeng Zhang, Jingwei Sun, Guangzhong Sun\",\"doi\":\"10.1109/IPDPSW50202.2020.00135\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Performance modeling is an important problem in high-performance computing (HPC). Machine Learning (ML) is a powerful approach for HPC performance modeling. ML can learn complex relations between application parameters and the performance of HPC applications from historical execution data. However, extrapolation of large-scale performance with only small-scale execution data using ML is difficult, because the independent and identically distributed hypothesis (the basic hypothesis of most ML algorithms) does not hold in this situation. To solve the extrapolation problem, we propose a two-level model consisting of interpolation level and extrapolation level. The interpolation level predicts small-scale performance with small-scale execution. The extrapolation level predicts the large-scale performance of the fixed input parameter with its small-scale performance predictions. We use the random forest to build interpolation models to predict small-scale performance in the interpolation level. In the extrapolation level, to reduce the negative influence of interpolation errors, we employ the multitask lasso with clustering to construct the scalability models to predict large-scale performance. To validate the utility of our two-level model, we conduct experiments on a real HPC platform. We build models for two HPC applications using our two-level model. Compare with existing ML methods, our method can achieve higher prediction accuracy.\",\"PeriodicalId\":398819,\"journal\":{\"name\":\"2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)\",\"volume\":\"118 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IPDPSW50202.2020.00135\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPSW50202.2020.00135","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

性能建模是高性能计算(HPC)中的一个重要问题。机器学习(ML)是一种强大的高性能计算性能建模方法。机器学习可以从历史执行数据中学习应用程序参数与高性能计算应用程序性能之间的复杂关系。然而，仅使用ML进行小规模执行数据的大规模性能外推是困难的，因为独立和同分布假设(大多数ML算法的基本假设)在这种情况下不成立。为了解决外推问题，我们提出了一个由内插层和外推层组成的两层模型。插值级别预测小规模执行的小规模性能。外推层用其小规模的性能预测来预测固定输入参数的大规模性能。我们使用随机森林建立插值模型来预测插值级别的小尺度性能。在外推层面，为了减少内插误差的负面影响，我们采用多任务套索和聚类来构建可扩展性模型来预测大规模的性能。为了验证两层模型的有效性，我们在一个真实的高性能计算平台上进行了实验。我们使用我们的两级模型为两个HPC应用程序构建模型。与现有的机器学习方法相比，我们的方法可以达到更高的预测精度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Using Small-Scale History Data to Predict Large-Scale Performance of HPC Application

Performance modeling is an important problem in high-performance computing (HPC). Machine Learning (ML) is a powerful approach for HPC performance modeling. ML can learn complex relations between application parameters and the performance of HPC applications from historical execution data. However, extrapolation of large-scale performance with only small-scale execution data using ML is difficult, because the independent and identically distributed hypothesis (the basic hypothesis of most ML algorithms) does not hold in this situation. To solve the extrapolation problem, we propose a two-level model consisting of interpolation level and extrapolation level. The interpolation level predicts small-scale performance with small-scale execution. The extrapolation level predicts the large-scale performance of the fixed input parameter with its small-scale performance predictions. We use the random forest to build interpolation models to predict small-scale performance in the interpolation level. In the extrapolation level, to reduce the negative influence of interpolation errors, we employ the multitask lasso with clustering to construct the scalability models to predict large-scale performance. To validate the utility of our two-level model, we conduct experiments on a real HPC platform. We build models for two HPC applications using our two-level model. Compare with existing ML methods, our method can achieve higher prediction accuracy.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

自引率

0.00%

发文量

期刊最新文献

PDCunplugged: A Free Repository of Unplugged Parallel Distributed Computing Activities Competitive Evolution of a UAV Swarm for Improving Intruder Detection Rates Workshop 7: HPBDC High-Performance Big Data and Cloud Computing Teaching Cloud Computing: Motivations, Challenges and Tools Exploring Chapel Productivity Using Some Graph Algorithms