{"title":"HPC System Software Enhanced by Source Code Analysis","authors":"Jidong Zhai","doi":"10.1145/3322789.3328741","DOIUrl":null,"url":null,"abstract":"Building efficient and scalable system software, especially performance analysis and monitoring, for large-scale systems, is increasingly important both for the developers of parallel applications and the designers of next-generation HPC systems. However, conventional performance tools suffer from significant time/space overhead due to the ever-increasing problem size and system scale. For instance, memory monitoring is of critical use in understanding applications and evaluating systems. Due to the dynamic nature in programs' memory accesses, common practice today leaves large amounts of address examination and data recording at runtime, at the cost of substantial performance overhead. On the other hand, the cost of source code analysis is independent of the problem size and system scale, making it very appealing for large-scale performance analysis. Inspired by this observation, we have designed a series of light-weight system software for HPC systems, such as a memory access monitoring tool, a performance variance detection tool , and a communication trace compression tool. In this talk, I will share our experience on building these tools through combining static analysis and runtime analysis and also point out main challenges in this direction.","PeriodicalId":365438,"journal":{"name":"Proceedings of the 9th International Workshop on Runtime and Operating Systems for Supercomputers","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 9th International Workshop on Runtime and Operating Systems for Supercomputers","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3322789.3328741","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Building efficient and scalable system software, especially performance analysis and monitoring, for large-scale systems, is increasingly important both for the developers of parallel applications and the designers of next-generation HPC systems. However, conventional performance tools suffer from significant time/space overhead due to the ever-increasing problem size and system scale. For instance, memory monitoring is of critical use in understanding applications and evaluating systems. Due to the dynamic nature in programs' memory accesses, common practice today leaves large amounts of address examination and data recording at runtime, at the cost of substantial performance overhead. On the other hand, the cost of source code analysis is independent of the problem size and system scale, making it very appealing for large-scale performance analysis. Inspired by this observation, we have designed a series of light-weight system software for HPC systems, such as a memory access monitoring tool, a performance variance detection tool , and a communication trace compression tool. In this talk, I will share our experience on building these tools through combining static analysis and runtime analysis and also point out main challenges in this direction.