Adaptive Learning for Concept Drift in Application Performance Modeling

Proceedings of the 48th International Conference on Parallel Processing Pub Date : 2019-08-05 DOI:10.1145/3337821.3337922

Sandeep Madireddy, Prasanna Balaprakash, P. Carns, R. Latham, Glenn K. Lockwood, R. Ross, S. Snyder, Stefan M. Wild

{"title":"Adaptive Learning for Concept Drift in Application Performance Modeling","authors":"Sandeep Madireddy, Prasanna Balaprakash, P. Carns, R. Latham, Glenn K. Lockwood, R. Ross, S. Snyder, Stefan M. Wild","doi":"10.1145/3337821.3337922","DOIUrl":null,"url":null,"abstract":"Supervised learning is a promising approach for modeling the performance of applications running on large HPC systems. A key assumption in supervised learning is that the training and testing data are obtained under the same conditions. However, in production HPC systems these conditions might not hold because the conditions of the platform can change over time as a result of hardware degradation, hardware replacement, software upgrade, and configuration updates. These changes could alter the data distribution in a way that affects the accuracy of the predictive performance models and render them less useful; this phenomenon is referred to as concept drift. Ignoring concept drift can lead to suboptimal resource usage and decreased efficiency when those performance models are deployed for tuning and job scheduling in production systems. To address this issue, we propose a concept-drift-aware predictive modeling approach that comprises two components: (1) an online Bayesian changepoint detection method that can automatically identify the location of events that lead to concept drift in near-real time and (2) a moment-matching transformation inspired by transfer learning that converts the training data collected before the drift to be useful for retraining. We use application input/output performance data collected on Cori, a production supercomputing system at the National Energy Research Scientific Computing Center, to demonstrate the effectiveness of our approach. The results show that concept-drift-aware models obtain significant improvement in accuracy; the median absolute error of the best-performing Gaussian process regression improved by 58.8% when the proposed approaches were used.","PeriodicalId":405273,"journal":{"name":"Proceedings of the 48th International Conference on Parallel Processing","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 48th International Conference on Parallel Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3337821.3337922","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

Abstract

Supervised learning is a promising approach for modeling the performance of applications running on large HPC systems. A key assumption in supervised learning is that the training and testing data are obtained under the same conditions. However, in production HPC systems these conditions might not hold because the conditions of the platform can change over time as a result of hardware degradation, hardware replacement, software upgrade, and configuration updates. These changes could alter the data distribution in a way that affects the accuracy of the predictive performance models and render them less useful; this phenomenon is referred to as concept drift. Ignoring concept drift can lead to suboptimal resource usage and decreased efficiency when those performance models are deployed for tuning and job scheduling in production systems. To address this issue, we propose a concept-drift-aware predictive modeling approach that comprises two components: (1) an online Bayesian changepoint detection method that can automatically identify the location of events that lead to concept drift in near-real time and (2) a moment-matching transformation inspired by transfer learning that converts the training data collected before the drift to be useful for retraining. We use application input/output performance data collected on Cori, a production supercomputing system at the National Energy Research Scientific Computing Center, to demonstrate the effectiveness of our approach. The results show that concept-drift-aware models obtain significant improvement in accuracy; the median absolute error of the best-performing Gaussian process regression improved by 58.8% when the proposed approaches were used.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

应用程序性能建模中概念漂移的自适应学习

监督学习是对运行在大型高性能计算系统上的应用程序进行性能建模的一种很有前途的方法。监督学习的一个关键假设是训练和测试数据是在相同的条件下获得的。然而，在生产HPC系统中，这些条件可能不成立，因为平台的条件可能随着时间的推移而变化，如硬件降级、硬件更换、软件升级和配置更新。这些变化可能会改变数据分布，从而影响预测性能模型的准确性，使其变得不那么有用;这种现象被称为概念漂移。当在生产系统中部署这些性能模型用于调优和作业调度时，忽略概念漂移可能导致资源使用不理想，并降低效率。为了解决这个问题，我们提出了一种概念漂移感知的预测建模方法，该方法由两个部分组成:(1)在线贝叶斯变化点检测方法，该方法可以近实时地自动识别导致概念漂移的事件的位置;(2)由迁移学习启发的时刻匹配转换，该转换将漂移前收集的训练数据转换为对再训练有用的数据。我们使用在Cori(国家能源研究科学计算中心的生产超级计算系统)上收集的应用程序输入/输出性能数据来证明我们方法的有效性。结果表明，概念漂移感知模型的精度得到了显著提高;采用上述方法后，表现最佳的高斯过程回归的中位数绝对误差提高了58.8%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the 48th International Conference on Parallel Processing

自引率

0.00%

发文量

期刊最新文献

Express Link Placement for NoC-Based Many-Core Platforms Cartesian Collective Communication Artemis A Specialized Concurrent Queue for Scheduling Irregular Workloads on GPUs diBELLA: Distributed Long Read to Long Read Alignment