Wenju Zhou, Jiepeng Zhang, Jingwei Sun, Guangzhong Sun
{"title":"利用小规模历史数据预测大规模高性能计算应用性能","authors":"Wenju Zhou, Jiepeng Zhang, Jingwei Sun, Guangzhong Sun","doi":"10.1109/IPDPSW50202.2020.00135","DOIUrl":null,"url":null,"abstract":"Performance modeling is an important problem in high-performance computing (HPC). Machine Learning (ML) is a powerful approach for HPC performance modeling. ML can learn complex relations between application parameters and the performance of HPC applications from historical execution data. However, extrapolation of large-scale performance with only small-scale execution data using ML is difficult, because the independent and identically distributed hypothesis (the basic hypothesis of most ML algorithms) does not hold in this situation. To solve the extrapolation problem, we propose a two-level model consisting of interpolation level and extrapolation level. The interpolation level predicts small-scale performance with small-scale execution. The extrapolation level predicts the large-scale performance of the fixed input parameter with its small-scale performance predictions. We use the random forest to build interpolation models to predict small-scale performance in the interpolation level. In the extrapolation level, to reduce the negative influence of interpolation errors, we employ the multitask lasso with clustering to construct the scalability models to predict large-scale performance. To validate the utility of our two-level model, we conduct experiments on a real HPC platform. We build models for two HPC applications using our two-level model. Compare with existing ML methods, our method can achieve higher prediction accuracy.","PeriodicalId":398819,"journal":{"name":"2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"118 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Using Small-Scale History Data to Predict Large-Scale Performance of HPC Application\",\"authors\":\"Wenju Zhou, Jiepeng Zhang, Jingwei Sun, Guangzhong Sun\",\"doi\":\"10.1109/IPDPSW50202.2020.00135\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Performance modeling is an important problem in high-performance computing (HPC). Machine Learning (ML) is a powerful approach for HPC performance modeling. ML can learn complex relations between application parameters and the performance of HPC applications from historical execution data. However, extrapolation of large-scale performance with only small-scale execution data using ML is difficult, because the independent and identically distributed hypothesis (the basic hypothesis of most ML algorithms) does not hold in this situation. To solve the extrapolation problem, we propose a two-level model consisting of interpolation level and extrapolation level. The interpolation level predicts small-scale performance with small-scale execution. The extrapolation level predicts the large-scale performance of the fixed input parameter with its small-scale performance predictions. We use the random forest to build interpolation models to predict small-scale performance in the interpolation level. In the extrapolation level, to reduce the negative influence of interpolation errors, we employ the multitask lasso with clustering to construct the scalability models to predict large-scale performance. To validate the utility of our two-level model, we conduct experiments on a real HPC platform. We build models for two HPC applications using our two-level model. Compare with existing ML methods, our method can achieve higher prediction accuracy.\",\"PeriodicalId\":398819,\"journal\":{\"name\":\"2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)\",\"volume\":\"118 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IPDPSW50202.2020.00135\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPSW50202.2020.00135","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Using Small-Scale History Data to Predict Large-Scale Performance of HPC Application
Performance modeling is an important problem in high-performance computing (HPC). Machine Learning (ML) is a powerful approach for HPC performance modeling. ML can learn complex relations between application parameters and the performance of HPC applications from historical execution data. However, extrapolation of large-scale performance with only small-scale execution data using ML is difficult, because the independent and identically distributed hypothesis (the basic hypothesis of most ML algorithms) does not hold in this situation. To solve the extrapolation problem, we propose a two-level model consisting of interpolation level and extrapolation level. The interpolation level predicts small-scale performance with small-scale execution. The extrapolation level predicts the large-scale performance of the fixed input parameter with its small-scale performance predictions. We use the random forest to build interpolation models to predict small-scale performance in the interpolation level. In the extrapolation level, to reduce the negative influence of interpolation errors, we employ the multitask lasso with clustering to construct the scalability models to predict large-scale performance. To validate the utility of our two-level model, we conduct experiments on a real HPC platform. We build models for two HPC applications using our two-level model. Compare with existing ML methods, our method can achieve higher prediction accuracy.