{"title":"领导力规模超级计算机上机器学习工作负载的I/O性能分析","authors":"Ahmad Maroof Karimi , Arnab K. Paul , Feiyi Wang","doi":"10.1016/j.peva.2022.102318","DOIUrl":null,"url":null,"abstract":"<div><p>The popularity of machine learning<span> technologies and frameworks has led to an increasingly large number of machine learning workloads running on high-performance computing (HPC) clusters. The ML workflows are readily being adopted in diverse computational fields such as Biology, Physics, Materials, and Computer Science. The I/O behavior of the emerging ML workloads distinctly differs from the traditional HPC workloads, such as simulation or checkpoint/restart-based HPC I/O behavior. Additionally, the ML workloads have also pushed for the utilization of GPUs or a combination of CPUs and GPUs in addition to using only CPUs for computational tasks. The diverse and complex I/O behavior of ML workloads requires extensive study and is critical for the efficient performance of various layers of the I/O stack and the overall performance of HPC workloads. This work aims to fill the gap in understanding the I/O behavior of emerging ML workloads by providing an in-depth analysis of ML jobs running on large-scale leadership HPC systems. In particular, we have analyzed the behavior of jobs based on the scale of the jobs, the science domains, and the processing units used by the ML jobs. The analysis was performed on 23,000 ML jobs collected from one year of Darshan logs running on Summit, which is one of the fastest supercomputers<span>. We also collect the CPU and GPU usage of 15,165 ML jobs by merging the Darshan dataset with the power usage of the processing units on Summit. Therefore, this paper is able to provide a systematic I/O characterization of ML workloads on a leadership scale HPC machine to understand how the I/O behavior differs for workloads across various science domains, the scale of workloads, and processing units and analyze the usage of parallel file system and burst buffer by ML I/O workloads. We have made several observations regarding I/O performances and access patterns through various analytical studies and discuss the important lessons learnt from the perspective of a ML user and a storage architect for emerging ML workloads running on large-scale supercomputers.</span></span></p></div>","PeriodicalId":19964,"journal":{"name":"Performance Evaluation","volume":"157 ","pages":"Article 102318"},"PeriodicalIF":1.0000,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"I/O performance analysis of machine learning workloads on leadership scale supercomputer\",\"authors\":\"Ahmad Maroof Karimi , Arnab K. Paul , Feiyi Wang\",\"doi\":\"10.1016/j.peva.2022.102318\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>The popularity of machine learning<span> technologies and frameworks has led to an increasingly large number of machine learning workloads running on high-performance computing (HPC) clusters. The ML workflows are readily being adopted in diverse computational fields such as Biology, Physics, Materials, and Computer Science. The I/O behavior of the emerging ML workloads distinctly differs from the traditional HPC workloads, such as simulation or checkpoint/restart-based HPC I/O behavior. Additionally, the ML workloads have also pushed for the utilization of GPUs or a combination of CPUs and GPUs in addition to using only CPUs for computational tasks. The diverse and complex I/O behavior of ML workloads requires extensive study and is critical for the efficient performance of various layers of the I/O stack and the overall performance of HPC workloads. This work aims to fill the gap in understanding the I/O behavior of emerging ML workloads by providing an in-depth analysis of ML jobs running on large-scale leadership HPC systems. In particular, we have analyzed the behavior of jobs based on the scale of the jobs, the science domains, and the processing units used by the ML jobs. The analysis was performed on 23,000 ML jobs collected from one year of Darshan logs running on Summit, which is one of the fastest supercomputers<span>. We also collect the CPU and GPU usage of 15,165 ML jobs by merging the Darshan dataset with the power usage of the processing units on Summit. Therefore, this paper is able to provide a systematic I/O characterization of ML workloads on a leadership scale HPC machine to understand how the I/O behavior differs for workloads across various science domains, the scale of workloads, and processing units and analyze the usage of parallel file system and burst buffer by ML I/O workloads. We have made several observations regarding I/O performances and access patterns through various analytical studies and discuss the important lessons learnt from the perspective of a ML user and a storage architect for emerging ML workloads running on large-scale supercomputers.</span></span></p></div>\",\"PeriodicalId\":19964,\"journal\":{\"name\":\"Performance Evaluation\",\"volume\":\"157 \",\"pages\":\"Article 102318\"},\"PeriodicalIF\":1.0000,\"publicationDate\":\"2022-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Performance Evaluation\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0166531622000268\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Performance Evaluation","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0166531622000268","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
I/O performance analysis of machine learning workloads on leadership scale supercomputer
The popularity of machine learning technologies and frameworks has led to an increasingly large number of machine learning workloads running on high-performance computing (HPC) clusters. The ML workflows are readily being adopted in diverse computational fields such as Biology, Physics, Materials, and Computer Science. The I/O behavior of the emerging ML workloads distinctly differs from the traditional HPC workloads, such as simulation or checkpoint/restart-based HPC I/O behavior. Additionally, the ML workloads have also pushed for the utilization of GPUs or a combination of CPUs and GPUs in addition to using only CPUs for computational tasks. The diverse and complex I/O behavior of ML workloads requires extensive study and is critical for the efficient performance of various layers of the I/O stack and the overall performance of HPC workloads. This work aims to fill the gap in understanding the I/O behavior of emerging ML workloads by providing an in-depth analysis of ML jobs running on large-scale leadership HPC systems. In particular, we have analyzed the behavior of jobs based on the scale of the jobs, the science domains, and the processing units used by the ML jobs. The analysis was performed on 23,000 ML jobs collected from one year of Darshan logs running on Summit, which is one of the fastest supercomputers. We also collect the CPU and GPU usage of 15,165 ML jobs by merging the Darshan dataset with the power usage of the processing units on Summit. Therefore, this paper is able to provide a systematic I/O characterization of ML workloads on a leadership scale HPC machine to understand how the I/O behavior differs for workloads across various science domains, the scale of workloads, and processing units and analyze the usage of parallel file system and burst buffer by ML I/O workloads. We have made several observations regarding I/O performances and access patterns through various analytical studies and discuss the important lessons learnt from the perspective of a ML user and a storage architect for emerging ML workloads running on large-scale supercomputers.
期刊介绍:
Performance Evaluation functions as a leading journal in the area of modeling, measurement, and evaluation of performance aspects of computing and communication systems. As such, it aims to present a balanced and complete view of the entire Performance Evaluation profession. Hence, the journal is interested in papers that focus on one or more of the following dimensions:
-Define new performance evaluation tools, including measurement and monitoring tools as well as modeling and analytic techniques
-Provide new insights into the performance of computing and communication systems
-Introduce new application areas where performance evaluation tools can play an important role and creative new uses for performance evaluation tools.
More specifically, common application areas of interest include the performance of:
-Resource allocation and control methods and algorithms (e.g. routing and flow control in networks, bandwidth allocation, processor scheduling, memory management)
-System architecture, design and implementation
-Cognitive radio
-VANETs
-Social networks and media
-Energy efficient ICT
-Energy harvesting
-Data centers
-Data centric networks
-System reliability
-System tuning and capacity planning
-Wireless and sensor networks
-Autonomic and self-organizing systems
-Embedded systems
-Network science