HPCGCN: A Predictive Framework on High Performance Computing Cluster Log Data Using Graph Convolutional Networks.

Avishek Bose, Huichen Yang, William H Hsu, Daniel Andresen
{"title":"HPCGCN: A Predictive Framework on High Performance Computing Cluster Log Data Using Graph Convolutional Networks.","authors":"Avishek Bose,&nbsp;Huichen Yang,&nbsp;William H Hsu,&nbsp;Daniel Andresen","doi":"10.1109/bigdata52589.2021.9671370","DOIUrl":null,"url":null,"abstract":"<p><p>This paper presents a novel use case of Graph Convolutional Network (GCN) learning representations for predictive data mining, specifically from user/task data in the domain of high-performance computing (HPC). It outlines an approach based on a coalesced data set: logs from the Slurm workload manager, joined with user experience survey data from computational cluster users. We introduce a new method of constructing a heterogeneous unweighted HPC graph consisting of multiple typed nodes after revealing the manifold relations between the nodes. The GCN structure used here supports two tasks: i) determining whether a job will complete or fail and ii) predicting memory and CPU requirements by training the GCN semi-supervised classification model and regression models on the generated graph. The graph is partitioned into partitions using graph clustering. We conducted classification and regression experiments using the proposed framework on our HPC log dataset and evaluated predictions by our trained models against baselines using test_score, F1-score, precision, recall for classification, and R1 score for regression, showing that our framework achieves significant improvements.</p>","PeriodicalId":74501,"journal":{"name":"Proceedings : ... IEEE International Conference on Big Data. IEEE International Conference on Big Data","volume":"2021 ","pages":"4113-4118"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9893918/pdf/nihms-1831840.pdf","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings : ... IEEE International Conference on Big Data. IEEE International Conference on Big Data","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/bigdata52589.2021.9671370","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

This paper presents a novel use case of Graph Convolutional Network (GCN) learning representations for predictive data mining, specifically from user/task data in the domain of high-performance computing (HPC). It outlines an approach based on a coalesced data set: logs from the Slurm workload manager, joined with user experience survey data from computational cluster users. We introduce a new method of constructing a heterogeneous unweighted HPC graph consisting of multiple typed nodes after revealing the manifold relations between the nodes. The GCN structure used here supports two tasks: i) determining whether a job will complete or fail and ii) predicting memory and CPU requirements by training the GCN semi-supervised classification model and regression models on the generated graph. The graph is partitioned into partitions using graph clustering. We conducted classification and regression experiments using the proposed framework on our HPC log dataset and evaluated predictions by our trained models against baselines using test_score, F1-score, precision, recall for classification, and R1 score for regression, showing that our framework achieves significant improvements.

Abstract Image

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
HPCGCN:基于图卷积网络的高性能计算集群日志数据预测框架。
本文提出了图卷积网络(GCN)学习表示用于预测数据挖掘的新用例,特别是高性能计算(HPC)领域的用户/任务数据。它概述了一种基于合并数据集的方法:来自Slurm工作负载管理器的日志,与来自计算集群用户的用户体验调查数据相结合。通过揭示节点间的流形关系,提出了一种构造由多个类型节点组成的异构无加权HPC图的新方法。这里使用的GCN结构支持两个任务:i)确定作业是完成还是失败;ii)通过在生成的图上训练GCN半监督分类模型和回归模型来预测内存和CPU需求。使用图聚类将图划分为多个分区。我们在HPC日志数据集上使用提出的框架进行了分类和回归实验,并使用test_score、F1-score、precision、recall(分类召回)和R1 score(回归)对基线进行了训练模型的预测,表明我们的框架取得了显著的改进。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Private Continuous Survival Analysis with Distributed Multi-Site Data. Zero-shot Learning with Minimum Instruction to Extract Social Determinants and Family History from Clinical Notes using GPT Model. Doctors vs. Nurses: Understanding the Great Divide in Vaccine Hesitancy among Healthcare Workers. Multi-Query Optimization Revisited: A Full-Query Algebraic Method. HPCGCN: A Predictive Framework on High Performance Computing Cluster Log Data Using Graph Convolutional Networks.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1