Pub Date : 1900-01-01DOI: 10.1109/ESCIW.2009.5407992
G. Ostrouchov, T. Naughton, C. Engelmann, G. Vallee, S. L. Scott
Large-scale computing systems provide great potential for scientific exploration. However, the complexity that accompanies these enormous machines raises challenges for both, users and operators. The effective use of such systems is often hampered by failures encountered when running applications on systems containing tens-of-thousands of nodes and hundreds-of-thousands of compute cores capable of yielding petaflops of performance. In systems of this size failure detection is complicated and root-cause diagnosis difficult. This paper describes our recent work in the identification of anomalies in monitoring data and system logs to provide further insights into machine status, runtime behavior, failure modes and failure root causes. It discusses the details of an initial prototype that gathers the data and uses statistical techniques for analysis.
{"title":"Nonparametric multivariate anomaly analysis in support of HPC resilience","authors":"G. Ostrouchov, T. Naughton, C. Engelmann, G. Vallee, S. L. Scott","doi":"10.1109/ESCIW.2009.5407992","DOIUrl":"https://doi.org/10.1109/ESCIW.2009.5407992","url":null,"abstract":"Large-scale computing systems provide great potential for scientific exploration. However, the complexity that accompanies these enormous machines raises challenges for both, users and operators. The effective use of such systems is often hampered by failures encountered when running applications on systems containing tens-of-thousands of nodes and hundreds-of-thousands of compute cores capable of yielding petaflops of performance. In systems of this size failure detection is complicated and root-cause diagnosis difficult. This paper describes our recent work in the identification of anomalies in monitoring data and system logs to provide further insights into machine status, runtime behavior, failure modes and failure root causes. It discusses the details of an initial prototype that gathers the data and uses statistical techniques for analysis.","PeriodicalId":416133,"journal":{"name":"2009 5th IEEE International Conference on E-Science Workshops","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126305965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1900-01-01DOI: 10.1109/ESCIW.2009.5407962
Scott Loynton, D. Sloan, Jean-Marie Burel, C. MacAulay
Academic scientific software development projects aim to provide valuable research tools to aid the scientific community in the process of discovery. The need for those tools to meet wider expectations of usability and good user experience design is critical to success; however the nature of academic e-science software development projects means they are often constrained by too narrow an understanding of the potential user base, and a focus on development at the expense of broad user research. We introduce the concept of the Project Community as a way of addressing these limitations in capacity to support effective “user research” that informs and influences longer-term marketing and outreach strategy as well as supporting the development of usable and useful scientific software.
{"title":"Towards a project community approach to academic scientific software development","authors":"Scott Loynton, D. Sloan, Jean-Marie Burel, C. MacAulay","doi":"10.1109/ESCIW.2009.5407962","DOIUrl":"https://doi.org/10.1109/ESCIW.2009.5407962","url":null,"abstract":"Academic scientific software development projects aim to provide valuable research tools to aid the scientific community in the process of discovery. The need for those tools to meet wider expectations of usability and good user experience design is critical to success; however the nature of academic e-science software development projects means they are often constrained by too narrow an understanding of the potential user base, and a focus on development at the expense of broad user research. We introduce the concept of the Project Community as a way of addressing these limitations in capacity to support effective “user research” that informs and influences longer-term marketing and outreach strategy as well as supporting the development of usable and useful scientific software.","PeriodicalId":416133,"journal":{"name":"2009 5th IEEE International Conference on E-Science Workshops","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121417246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}