Performance metrics and auditing framework for high performance computer systems

T. Furlani, Matthew D. Jones, S. Gallo, Andrew E. Bruno, Charng-Da Lu, Amin Ghadersohi, Ryan J. Gentner, A. Patra, R. L. Deleon, G. Laszewski, Lizhe Wang, Ann Zimmerman
{"title":"Performance metrics and auditing framework for high performance computer systems","authors":"T. Furlani, Matthew D. Jones, S. Gallo, Andrew E. Bruno, Charng-Da Lu, Amin Ghadersohi, Ryan J. Gentner, A. Patra, R. L. Deleon, G. Laszewski, Lizhe Wang, Ann Zimmerman","doi":"10.1145/2016741.2016759","DOIUrl":null,"url":null,"abstract":"This paper describes a comprehensive auditing framework, XDMoD, for use by high performance computing centers to readily provide metrics regarding resource utilization (CPU hours, job size, wait time, etc), resource performance, and the center's impact in terms of scholarship and research. This role-based auditing framework is designed to meet the following objectives: (1) provide the user community with an easy to use tool to oversee their allocations and optimize their use of resources, (2) provide staff with easy access to performance metrics and diagnostics to monitor and tune resource performance for the benefit of the users, (3) provide senior management with a tool to easily monitor utilization, user base, and performance of resources, and (4) help ensure that the resources are effectively enabling research and scholarship. XDMoD is initially focused on the NSF TeraGrid (TG) and follow-on XSEDE (XD) program, where it will become a key component of the TG/XSEDE User Portal. However, this auditing system is intended to have a general applicability to any HPC system or center.\n The XDMoD auditing system is architected using a set of modular components that facilitate the utilization of community contributed components information. It includes an active and reactive (as opposed to passive) service set accessible through a variety of endpoints such as web-based user interface, RESTful web services, and provided development tools. One component also provides a computationally lightweight and flexible application kernel auditing system that reflects best-in-class performance kernels to measure overall system performance with respect to existing applications that are actually being run by users. This allows continuous resource auditing to monitor all aspects of system performance, most critically from a completely user-centric point of view.","PeriodicalId":257555,"journal":{"name":"TeraGrid Conference","volume":"68 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"TeraGrid Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2016741.2016759","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

This paper describes a comprehensive auditing framework, XDMoD, for use by high performance computing centers to readily provide metrics regarding resource utilization (CPU hours, job size, wait time, etc), resource performance, and the center's impact in terms of scholarship and research. This role-based auditing framework is designed to meet the following objectives: (1) provide the user community with an easy to use tool to oversee their allocations and optimize their use of resources, (2) provide staff with easy access to performance metrics and diagnostics to monitor and tune resource performance for the benefit of the users, (3) provide senior management with a tool to easily monitor utilization, user base, and performance of resources, and (4) help ensure that the resources are effectively enabling research and scholarship. XDMoD is initially focused on the NSF TeraGrid (TG) and follow-on XSEDE (XD) program, where it will become a key component of the TG/XSEDE User Portal. However, this auditing system is intended to have a general applicability to any HPC system or center. The XDMoD auditing system is architected using a set of modular components that facilitate the utilization of community contributed components information. It includes an active and reactive (as opposed to passive) service set accessible through a variety of endpoints such as web-based user interface, RESTful web services, and provided development tools. One component also provides a computationally lightweight and flexible application kernel auditing system that reflects best-in-class performance kernels to measure overall system performance with respect to existing applications that are actually being run by users. This allows continuous resource auditing to monitor all aspects of system performance, most critically from a completely user-centric point of view.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
高性能计算机系统的性能度量和审计框架
本文描述了一个全面的审计框架XDMoD,高性能计算中心可以使用它轻松地提供有关资源利用率(CPU小时数、作业大小、等待时间等)、资源性能以及中心在学术和研究方面的影响的指标。此基于角色的审核框架旨在实现以下目标:(1)为用户社区提供一个易于使用的工具,以监督他们的分配和优化他们对资源的使用;(2)为员工提供方便的访问性能指标和诊断,以监控和调整资源性能,以使用户受益;(3)为高级管理层提供一个工具,以方便地监控资源的利用率、用户基础和性能;(4)帮助确保资源有效地用于研究和奖学金。XDMoD最初专注于NSF TeraGrid (TG)和后续的XSEDE (XD)计划,在那里它将成为TG/XSEDE用户门户的关键组件。然而,该审核系统旨在对任何HPC系统或中心具有普遍适用性。XDMoD审计系统的架构使用一组模块化组件,这些组件有助于利用社区提供的组件信息。它包括主动和被动(相对于被动)服务集,可通过各种端点(如基于web的用户界面、RESTful web服务和提供的开发工具)进行访问。其中一个组件还提供了计算轻量级和灵活的应用程序内核审计系统,该系统反映了一流的性能内核,以衡量相对于用户实际运行的现有应用程序的整体系统性能。这允许持续的资源审计来监视系统性能的所有方面,最关键的是从完全以用户为中心的角度来看。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Autotuned parallel I/O for highly scalable biosequence analysis A European framework to build science gateways: architecture and use cases Using the TeraGrid to teach scientific computing A scalable multi-scale framework for parallel simulation and visualization of microbial evolution Coming to consensus on competencies for petascale computing education and training
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1