Lessons learned at 208K: Towards debugging millions of cores

2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis Pub Date : 2008-11-15 DOI:10.1109/SC.2008.5218557

Gregory L. Lee, D. Ahn, D. Arnold, B. Supinski, M. LeGendre, B. Miller, M. Schulz, B. Liblit

{"title":"Lessons learned at 208K: Towards debugging millions of cores","authors":"Gregory L. Lee, D. Ahn, D. Arnold, B. Supinski, M. LeGendre, B. Miller, M. Schulz, B. Liblit","doi":"10.1109/SC.2008.5218557","DOIUrl":null,"url":null,"abstract":"Petascale systems will present several new challenges to performance and correctness tools. Such machines may contain millions of cores, requiring that tools use scalable data structures and analysis algorithms to collect and to process application data. In addition, at such scales, each tool itself will become a large parallel application - already, debugging the full Blue-Gene/L (BG/L) installation at the Lawrence Livermore National Laboratory requires employing 1664 tool daemons. To reach such sizes and beyond, tools must use a scalable communication infrastructure and manage their own tool processes efficiently. Some system resources, such as the file system, may also become tool bottlenecks. In this paper, we present challenges to petascale tool development, using the stack trace analysis tool (STAT) as a case study. STAT is a lightweight tool that gathers and merges stack traces from a parallel application to identify process equivalence classes. We use results gathered at thousands of tasks on an Infiniband cluster and results up to 208 K processes on BG/L to identify current scalability issues as well as challenges that will be faced at the petascale. We then present implemented solutions to these challenges and show the resulting performance improvements. We also discuss future plans to meet the debugging demands of petascale machines.","PeriodicalId":230761,"journal":{"name":"2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis","volume":"86 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"53","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SC.2008.5218557","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 53

Abstract

Petascale systems will present several new challenges to performance and correctness tools. Such machines may contain millions of cores, requiring that tools use scalable data structures and analysis algorithms to collect and to process application data. In addition, at such scales, each tool itself will become a large parallel application - already, debugging the full Blue-Gene/L (BG/L) installation at the Lawrence Livermore National Laboratory requires employing 1664 tool daemons. To reach such sizes and beyond, tools must use a scalable communication infrastructure and manage their own tool processes efficiently. Some system resources, such as the file system, may also become tool bottlenecks. In this paper, we present challenges to petascale tool development, using the stack trace analysis tool (STAT) as a case study. STAT is a lightweight tool that gathers and merges stack traces from a parallel application to identify process equivalence classes. We use results gathered at thousands of tasks on an Infiniband cluster and results up to 208 K processes on BG/L to identify current scalability issues as well as challenges that will be faced at the petascale. We then present implemented solutions to these challenges and show the resulting performance improvements. We also discuss future plans to meet the debugging demands of petascale machines.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

208K的经验教训:调试数百万核

千兆级系统将对性能和正确性工具提出几个新的挑战。这样的机器可能包含数百万个核心，需要工具使用可扩展的数据结构和分析算法来收集和处理应用程序数据。此外，在这样的规模下，每个工具本身将成为一个大型并行应用程序——在劳伦斯利弗莫尔国家实验室调试完整的Blue-Gene/L (BG/L)安装需要使用1664个工具守护进程。为了达到这样的规模，工具必须使用可伸缩的通信基础设施，并有效地管理它们自己的工具流程。某些系统资源(如文件系统)也可能成为工具的瓶颈。在本文中，我们使用堆栈跟踪分析工具(STAT)作为案例研究，提出了千兆级工具开发面临的挑战。STAT是一个轻量级工具，它收集并合并来自并行应用程序的堆栈跟踪，以识别进程等价类。我们使用Infiniband集群上数千个任务收集的结果，以及BG/L上高达208k进程的结果，以确定当前的可扩展性问题以及将在千兆级上面临的挑战。然后，我们将介绍针对这些挑战的实现解决方案，并展示由此带来的性能改进。我们还讨论了未来的计划，以满足千兆级机器的调试需求。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis

自引率

0.00%

发文量