An Analysis of Node Sharing on HPC Clusters using XDMoD/TACC_Stats

Proceedings of XSEDE16 : Diversity, Big Data, and Science at Scale : July 17-21, 2016, Intercontinental Miami Hotel, Miami, Florida, USA. Conference on Extreme Science and Engineering Discovery Environment (5th : 2016 : Miami, Fla.) Pub Date : 2014-07-13 DOI:10.1145/2616498.2616533

Joseph P. White, R. L. Deleon, T. Furlani, S. Gallo, Matthew D. Jones, Amin Ghadersohi, Cynthia D. Cornelius, A. Patra, J. Browne, W. Barth, John L. Hammond

{"title":"An Analysis of Node Sharing on HPC Clusters using XDMoD/TACC_Stats","authors":"Joseph P. White, R. L. Deleon, T. Furlani, S. Gallo, Matthew D. Jones, Amin Ghadersohi, Cynthia D. Cornelius, A. Patra, J. Browne, W. Barth, John L. Hammond","doi":"10.1145/2616498.2616533","DOIUrl":null,"url":null,"abstract":"When a user requests less than a full node for a job on XSEDE's large resources - Stampede and Lonestar4 -, that is less than 16 cores on Stampede or 12 cores on Lonestar4, they are assigned a full node by policy. Although the actual CPU hours consumed by these jobs is small when compared to the total CPU hours delivered by these resources, they do represent a substantial fraction of the total number of jobs (~18% for Stampede and ~15% for Lonestar4 between January and February 2014). Academic HPC centers, such as the Center for Computational Research (CCR) at the University at Buffalo, SUNY typically have a much larger proportion of small jobs than the large XSEDE systems. For CCR's production cluster, Rush, the decision was made to allow the allocation of simultaneous jobs on the same node. This greatly increases the overall throughput but also raises questions whether the jobs that share the same node will interfere with one another. We present here an analysis that explores this issue using data from Rush, Stampede and Lonestar4. Analysis of usage data indicates little interference.","PeriodicalId":93364,"journal":{"name":"Proceedings of XSEDE16 : Diversity, Big Data, and Science at Scale : July 17-21, 2016, Intercontinental Miami Hotel, Miami, Florida, USA. Conference on Extreme Science and Engineering Discovery Environment (5th : 2016 : Miami, Fla.)","volume":"13 1","pages":"31:1-31:8"},"PeriodicalIF":0.0000,"publicationDate":"2014-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of XSEDE16 : Diversity, Big Data, and Science at Scale : July 17-21, 2016, Intercontinental Miami Hotel, Miami, Florida, USA. Conference on Extreme Science and Engineering Discovery Environment (5th : 2016 : Miami, Fla.)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2616498.2616533","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 11

Abstract

When a user requests less than a full node for a job on XSEDE's large resources - Stampede and Lonestar4 -, that is less than 16 cores on Stampede or 12 cores on Lonestar4, they are assigned a full node by policy. Although the actual CPU hours consumed by these jobs is small when compared to the total CPU hours delivered by these resources, they do represent a substantial fraction of the total number of jobs (~18% for Stampede and ~15% for Lonestar4 between January and February 2014). Academic HPC centers, such as the Center for Computational Research (CCR) at the University at Buffalo, SUNY typically have a much larger proportion of small jobs than the large XSEDE systems. For CCR's production cluster, Rush, the decision was made to allow the allocation of simultaneous jobs on the same node. This greatly increases the overall throughput but also raises questions whether the jobs that share the same node will interfere with one another. We present here an analysis that explores this issue using data from Rush, Stampede and Lonestar4. Analysis of usage data indicates little interference.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于XDMoD/TACC_Stats的HPC集群节点共享分析

当用户为XSEDE的大型资源(Stampede和Lonestar4)上的作业请求少于一个完整节点时，即在Stampede上少于16个内核，在Lonestar4上少于12个内核，则根据策略为它们分配一个完整节点。尽管与这些资源交付的总CPU时间相比，这些作业实际消耗的CPU时间很少，但它们确实占作业总数的很大一部分(2014年1月至2月期间Stampede约占18%，Lonestar4约占15%)。学术高性能计算中心，如纽约州立大学布法罗分校的计算研究中心(CCR)，通常比大型XSEDE系统拥有更大比例的小型作业。对于CCR的生产集群Rush，决定允许在同一节点上分配并发作业。这极大地提高了总体吞吐量，但也提出了共享同一节点的作业是否会相互干扰的问题。我们在此使用《Rush》、《Stampede》和《Lonestar4》的数据分析了这一问题。对使用数据的分析表明干扰很小。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of XSEDE16 : Diversity, Big Data, and Science at Scale : July 17-21, 2016, Intercontinental Miami Hotel, Miami, Florida, USA. Conference on Extreme Science and Engineering Discovery Environment (5th : 2016 : Miami, Fla.)

自引率

0.00%

发文量

期刊最新文献

CloudBridge: a Simple Cross-Cloud Python Library. pbsacct: A Workload Analysis System for PBS-Based HPC Systems ECSS Experience: Particle Tracing Reinvented Fast, Low-Memory Algorithm for Construction of Nanosecond Level Snapshots of Financial Markets Benchmarking SSD-Based Lustre File System Configurations