Using XDMoD to facilitate XSEDE operations, planning and analysis

T. Furlani, Barry L. Schneider, Matthew D. Jones, John Towns, David L. Hart, S. Gallo, R. L. Deleon, Charng-Da Lu, Amin Ghadersohi, Ryan J. Gentner, A. Patra, G. Laszewski, Fugang Wang, Jeffrey T. Palmer, N. Simakov
{"title":"Using XDMoD to facilitate XSEDE operations, planning and analysis","authors":"T. Furlani, Barry L. Schneider, Matthew D. Jones, John Towns, David L. Hart, S. Gallo, R. L. Deleon, Charng-Da Lu, Amin Ghadersohi, Ryan J. Gentner, A. Patra, G. Laszewski, Fugang Wang, Jeffrey T. Palmer, N. Simakov","doi":"10.1145/2484762.2484763","DOIUrl":null,"url":null,"abstract":"The XDMoD auditing tool provides, for the first time, a comprehensive tool to measure both utilization and performance of high-end cyberinfrastructure (CI), with initial focus on XSEDE. Here, we demonstrate, through several case studies, its utility for providing important metrics regarding resource utilization and performance of TeraGrid/XSEDE that can be used for detailed analysis and planning as well as improving operational efficiency and performance. Measuring the utilization of high-end cyberinfrastructure such as XSEDE helps provide a detailed understanding of how a given CI resource is being utilized and can lead to improved performance of the resource in terms of job throughput or any number of desired job characteristics. In the case studies considered here, a detailed historical analysis of XSEDE usage data using XDMoD clearly demonstrates the tremendous growth in the number of users, overall usage, and scale of the simulations routinely carried out. Not surprisingly, physics, chemistry, and the engineering disciplines are shown to be heavy users of the resources. However, as the data clearly show, molecular biosciences are now a significant and growing user of XSEDE resources, accounting for more than 20 percent of all SUs consumed in 2012. XDMoD shows that the resources required by the various scientific disciplines are very different. Physics, Astronomical sciences, and Atmospheric sciences tend to solve large problems requiring many cores. Molecular biosciences applications on the other hand, require many cycles but do not employ core counts that are as large. Such distinctions are important in guiding future cyberinfrastructure design decisions. XDMoD's implementation of a novel application kernel-based auditing system to measure overall CI system performance and quality of service is shown, through several examples, to provide a useful means to automatically detect under performing hardware and software. This capability is especially critical given the complex composition of today's advanced CI. Examples include an application kernel based on a widely used quantum chemistry program that uncovered a software bug in the I/O stack of a commercial parallel file system, which was subsequently fixed by the vendor in the form of a software patch that is now part of their standard release. This error, which resulted in dramatically increased execution times as well as outright job failure, would likely have gone unnoticed for sometime and was only uncovered as a result of implementation of XDMoD's suite of application kernels.","PeriodicalId":426819,"journal":{"name":"Proceedings of the Conference on Extreme Science and Engineering Discovery Environment: Gateway to Discovery","volume":"36 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"37","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Conference on Extreme Science and Engineering Discovery Environment: Gateway to Discovery","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2484762.2484763","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 37

Abstract

The XDMoD auditing tool provides, for the first time, a comprehensive tool to measure both utilization and performance of high-end cyberinfrastructure (CI), with initial focus on XSEDE. Here, we demonstrate, through several case studies, its utility for providing important metrics regarding resource utilization and performance of TeraGrid/XSEDE that can be used for detailed analysis and planning as well as improving operational efficiency and performance. Measuring the utilization of high-end cyberinfrastructure such as XSEDE helps provide a detailed understanding of how a given CI resource is being utilized and can lead to improved performance of the resource in terms of job throughput or any number of desired job characteristics. In the case studies considered here, a detailed historical analysis of XSEDE usage data using XDMoD clearly demonstrates the tremendous growth in the number of users, overall usage, and scale of the simulations routinely carried out. Not surprisingly, physics, chemistry, and the engineering disciplines are shown to be heavy users of the resources. However, as the data clearly show, molecular biosciences are now a significant and growing user of XSEDE resources, accounting for more than 20 percent of all SUs consumed in 2012. XDMoD shows that the resources required by the various scientific disciplines are very different. Physics, Astronomical sciences, and Atmospheric sciences tend to solve large problems requiring many cores. Molecular biosciences applications on the other hand, require many cycles but do not employ core counts that are as large. Such distinctions are important in guiding future cyberinfrastructure design decisions. XDMoD's implementation of a novel application kernel-based auditing system to measure overall CI system performance and quality of service is shown, through several examples, to provide a useful means to automatically detect under performing hardware and software. This capability is especially critical given the complex composition of today's advanced CI. Examples include an application kernel based on a widely used quantum chemistry program that uncovered a software bug in the I/O stack of a commercial parallel file system, which was subsequently fixed by the vendor in the form of a software patch that is now part of their standard release. This error, which resulted in dramatically increased execution times as well as outright job failure, would likely have gone unnoticed for sometime and was only uncovered as a result of implementation of XDMoD's suite of application kernels.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
使用XDMoD促进XSEDE操作、计划和分析
XDMoD审计工具首次提供了一个全面的工具来衡量高端网络基础设施(CI)的利用率和性能,最初的重点是XSEDE。在这里,我们将通过几个案例研究演示它在提供有关TeraGrid/XSEDE的资源利用率和性能的重要指标方面的效用,这些指标可用于详细分析和规划,以及提高操作效率和性能。测量高端网络基础设施(如XSEDE)的利用率有助于详细了解给定CI资源是如何被利用的,并且可以在作业吞吐量或任意数量的所需作业特征方面提高资源的性能。在这里考虑的案例研究中,使用XDMoD对XSEDE使用数据进行了详细的历史分析,清楚地显示了用户数量、总体使用情况和常规执行的模拟规模的巨大增长。毫不奇怪,物理、化学和工程学科被证明是资源的大量使用者。然而,正如数据清楚显示的那样,分子生物科学现在是XSEDE资源的一个重要且不断增长的用户,占2012年所有SUs消耗的20%以上。XDMoD表明,各种科学学科所需的资源是非常不同的。物理学、天文科学和大气科学倾向于解决需要许多核心的大型问题。另一方面,分子生物科学应用程序需要许多循环,但不使用如此大的核心计数。这些区别对于指导未来的网络基础设施设计决策非常重要。XDMoD实现了一种新的基于应用程序内核的审计系统,用于度量整体CI系统的性能和服务质量。通过几个示例,XDMoD提供了一种有用的方法来自动检测性能不佳的硬件和软件。考虑到当今高级CI的复杂组成,这种能力尤其重要。示例包括基于广泛使用的量子化学程序的应用程序内核,该程序在商业并行文件系统的I/O堆栈中发现了一个软件错误,随后由供应商以软件补丁的形式修复,该软件补丁现在是其标准版本的一部分。这个错误导致了执行时间的急剧增加和作业的彻底失败,可能在一段时间内没有被注意到,只有在实现了XDMoD的应用程序内核套件之后才被发现。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Optimizing utilization across XSEDE platforms Adaptive latency-aware parallel resource mapping: task graph scheduling onto heterogeneous network topology Optimizing the PCIT algorithm on stampede's Xeon and Xeon Phi processors for faster discovery of biological networks Training, education, and outreach: raising the bar Preliminary experiences with the uintah framework on Intel Xeon Phi and stampede
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1