Babak Behzad, Hoang-Vu Dang, Farah Hariri, Weizhe Zhang, M. Snir
The study of the I/O performance of a parallel application can be facilitated by the use of an I/O kernel -- a program that generates the same I/O calls as the original application, but can be executed much faster. Such I/O kernels are especially important when the programs under study are proprietary or classified, and only available in binary form.In this paper, we show how to create automatically such an I/O kernel, by executing the target application with an instrumented I/O library, next "compressing" the resulting I/O traces into a compact C program that generates those traces.
{"title":"Automatic Generation of I/O Kernels for HPC Applications","authors":"Babak Behzad, Hoang-Vu Dang, Farah Hariri, Weizhe Zhang, M. Snir","doi":"10.1109/PDSW.2014.6","DOIUrl":"https://doi.org/10.1109/PDSW.2014.6","url":null,"abstract":"The study of the I/O performance of a parallel application can be facilitated by the use of an I/O kernel -- a program that generates the same I/O calls as the original application, but can be executed much faster. Such I/O kernels are especially important when the programs under study are proprietary or classified, and only available in binary form.In this paper, we show how to create automatically such an I/O kernel, by executing the target application with an instrumented I/O library, next \"compressing\" the resulting I/O traces into a compact C program that generates those traces.","PeriodicalId":151633,"journal":{"name":"2014 9th Parallel Data Storage Workshop","volume":"178 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133849642","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
HPC platforms are capable of generating huge amounts of metadata about different entities including jobs, users, and files. Simple metadata, which describe the attributes of these entities (e.g., file size, name, and permissions mode), has been well recorded and used in current systems. However, only a limited amount of rich metadata, which records not only the attributes of entities but also relationships between them, are captured in current HPC systems. Rich metadata may include information from many sources, including users and applications, and must be integrated into a unified framework. Collecting, integrating, processing, and querying such a large volume of metadata pose considerable challenges for HPC systems. In this paper, we propose a rich metadata management approach that unifies metadata into one generic property graph. We argue that this approach supports not only simple metadata operations such as directory traversal and permission validation but also rich metadata operations such as provenance query and security auditing. The property graph approach provides an extensible method to store diverse metadata and presents an opportunity to leverage rapidly evolving graph storage and processing techniques.
{"title":"Using Property Graphs for Rich Metadata Management in HPC Systems","authors":"Dong Dai, R. Ross, P. Carns, D. Kimpe, Yong Chen","doi":"10.1109/PDSW.2014.11","DOIUrl":"https://doi.org/10.1109/PDSW.2014.11","url":null,"abstract":"HPC platforms are capable of generating huge amounts of metadata about different entities including jobs, users, and files. Simple metadata, which describe the attributes of these entities (e.g., file size, name, and permissions mode), has been well recorded and used in current systems. However, only a limited amount of rich metadata, which records not only the attributes of entities but also relationships between them, are captured in current HPC systems. Rich metadata may include information from many sources, including users and applications, and must be integrated into a unified framework. Collecting, integrating, processing, and querying such a large volume of metadata pose considerable challenges for HPC systems. In this paper, we propose a rich metadata management approach that unifies metadata into one generic property graph. We argue that this approach supports not only simple metadata operations such as directory traversal and permission validation but also rich metadata operations such as provenance query and security auditing. The property graph approach provides an extensible method to store diverse metadata and presents an opportunity to leverage rapidly evolving graph storage and processing techniques.","PeriodicalId":151633,"journal":{"name":"2014 9th Parallel Data Storage Workshop","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131082854","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Evaluating I/O performance of an application across different systems is a daunting task because it requires preparation of the software dependencies and required input data. Feign aims to be an extensible trace replay solution for parallel applications that supports arbitrary software and library layers. The tool abstracts and streamlines the replay process while allowing plug-ins to provide, manipulate and interpret trace data. Therewith, the application's behavior can be evaluated without potentially proprietary or confidential software and input data.Even more interesting is the potential of Feign as a virtual laboratory for I/O research: by manipulating trace data, experiments can be conducted; for example, it becomes possible to evaluate the benefit of optimization strategies. Since a plug-in could determine "future" activities, this enables us to develop optimal strategies as baselines for any run-time heuristics, but also eases testing of a developed strategy on many applications without modifying them.The paper proposes and evaluates a workflow to automatically apply optimization candidates to application traces and approximate potential performance gains. By using Feign's reporting facilities, an automatic optimization engine can then independently conduct experiments by feeding traces and strategies to compare the results.
{"title":"Feign: In-Silico Laboratory for Researching I/O Strategies","authors":"Jakob Lüttgau, J. Kunkel","doi":"10.1109/PDSW.2014.9","DOIUrl":"https://doi.org/10.1109/PDSW.2014.9","url":null,"abstract":"Evaluating I/O performance of an application across different systems is a daunting task because it requires preparation of the software dependencies and required input data. Feign aims to be an extensible trace replay solution for parallel applications that supports arbitrary software and library layers. The tool abstracts and streamlines the replay process while allowing plug-ins to provide, manipulate and interpret trace data. Therewith, the application's behavior can be evaluated without potentially proprietary or confidential software and input data.Even more interesting is the potential of Feign as a virtual laboratory for I/O research: by manipulating trace data, experiments can be conducted; for example, it becomes possible to evaluate the benefit of optimization strategies. Since a plug-in could determine \"future\" activities, this enables us to develop optimal strategies as baselines for any run-time heuristics, but also eases testing of a developed strategy on many applications without modifying them.The paper proposes and evaluates a workflow to automatically apply optimization candidates to application traces and approximate potential performance gains. By using Feign's reporting facilities, an automatic optimization engine can then independently conduct experiments by feeding traces and strategies to compare the results.","PeriodicalId":151633,"journal":{"name":"2014 9th Parallel Data Storage Workshop","volume":"2 6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124717643","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lei Xu, Ziling Huang, Hong Jiang, Lei Tian, D. Swanson
In this paper, we propose a Versatile Searchable File System, VSFS, which builds POSIX-compatible namespace using a novel Namespace-based File Query Language (NFQL). This enables analytics applications to utilize VSFS high-performance file-search service without changing their data model. VSFS versatile file-indexing mechanism is designed to offer great flexibility for applications to control indices to satisfy analytics needs. The evaluations driven by two real-world analytics applications demonstrate VSFS' high scalability and powerful data-filtering functionality.
{"title":"VSFS: A Searchable Distributed File System","authors":"Lei Xu, Ziling Huang, Hong Jiang, Lei Tian, D. Swanson","doi":"10.1109/PDSW.2014.10","DOIUrl":"https://doi.org/10.1109/PDSW.2014.10","url":null,"abstract":"In this paper, we propose a Versatile Searchable File System, VSFS, which builds POSIX-compatible namespace using a novel Namespace-based File Query Language (NFQL). This enables analytics applications to utilize VSFS high-performance file-search service without changing their data model. VSFS versatile file-indexing mechanism is designed to offer great flexibility for applications to control indices to satisfy analytics needs. The evaluations driven by two real-world analytics applications demonstrate VSFS' high scalability and powerful data-filtering functionality.","PeriodicalId":151633,"journal":{"name":"2014 9th Parallel Data Storage Workshop","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125260880","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Konstantinos Chasapis, M. F. Dolz, Michael Kuhn, T. Ludwig
With the emergence of multi-core and multi-socket non-uniform memory access (NUMA) platforms in recent years, new software challenges have arisen to use them efficiently. In the field of high performance computing (HPC), parallel programming has always been the key factor to improve applications performance. However, the implications of parallel architectures in the system software has been overlooked until recently. In this work, we examine the implications of such platforms in the performance scalability of the Lustre parallel distributed file system's metadata server (MDS). We run our experiments on a four socket NUMA platform that has 48 cores. We leverage the mdtest benchmark to generate appropriate metadata workloads and include configurations with varying numbers of active cores and mount points. Additionally, we compare Lustre's metadata scalability with the local file systems ext4 and XFS. The results demonstrate that Lustre's metadata performance is limited to a single socket and decreases when more sockets are used. We also observe that the MDS's back-end device is not a limiting factor regarding the performance.
{"title":"Evaluating Lustre's Metadata Server on a Multi-Socket Platform","authors":"Konstantinos Chasapis, M. F. Dolz, Michael Kuhn, T. Ludwig","doi":"10.1109/PDSW.2014.5","DOIUrl":"https://doi.org/10.1109/PDSW.2014.5","url":null,"abstract":"With the emergence of multi-core and multi-socket non-uniform memory access (NUMA) platforms in recent years, new software challenges have arisen to use them efficiently. In the field of high performance computing (HPC), parallel programming has always been the key factor to improve applications performance. However, the implications of parallel architectures in the system software has been overlooked until recently. In this work, we examine the implications of such platforms in the performance scalability of the Lustre parallel distributed file system's metadata server (MDS). We run our experiments on a four socket NUMA platform that has 48 cores. We leverage the mdtest benchmark to generate appropriate metadata workloads and include configurations with varying numbers of active cores and mount points. Additionally, we compare Lustre's metadata scalability with the local file systems ext4 and XFS. The results demonstrate that Lustre's metadata performance is limited to a single socket and decreases when more sockets are used. We also observe that the MDS's back-end device is not a limiting factor regarding the performance.","PeriodicalId":151633,"journal":{"name":"2014 9th Parallel Data Storage Workshop","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121068571","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The process of migrating a virtual machine and its virtual storage can greatly degrade the performance of other guests and applications running on the same host, including the migrating machine itself. Through experimental evaluation, we investigate the I/O performance degradation imposed by storage migration on co-located machines. We examine naive approaches for mitigating this interference by adjusting host system settings and migration parameters. While effective in some contexts, our analysis demonstrates that performing a migration using these I/O constraining techniques will increase migration latency and limit its ability to converge. Therefore, we present a design and analysis of Storage Migration Offloading, a migration method that reduces I/O interference, maintains lower migration latency, and converges under higher dirty rates. Storage Migration Offloading utilizes a buffer store populated during migration using a dynamic cache policy and rate controlled prefetching. Data is transferred to the destination host from both the buffer and primary disks in a way that minimizes interference on the primary disk while attempting to maintain the desired migration speed.
{"title":"Alleviating I/O Interference via Caching and Rate-Controlled Prefetching without Degrading Migration Performance","authors":"Morgan Stuart, Tao Lu, Xubin He","doi":"10.1109/PDSW.2014.8","DOIUrl":"https://doi.org/10.1109/PDSW.2014.8","url":null,"abstract":"The process of migrating a virtual machine and its virtual storage can greatly degrade the performance of other guests and applications running on the same host, including the migrating machine itself. Through experimental evaluation, we investigate the I/O performance degradation imposed by storage migration on co-located machines. We examine naive approaches for mitigating this interference by adjusting host system settings and migration parameters. While effective in some contexts, our analysis demonstrates that performing a migration using these I/O constraining techniques will increase migration latency and limit its ability to converge. Therefore, we present a design and analysis of Storage Migration Offloading, a migration method that reduces I/O interference, maintains lower migration latency, and converges under higher dirty rates. Storage Migration Offloading utilizes a buffer store populated during migration using a dynamic cache policy and rate controlled prefetching. Data is transferred to the destination host from both the buffer and primary disks in a way that minimizes interference on the primary disk while attempting to maintain the desired migration speed.","PeriodicalId":151633,"journal":{"name":"2014 9th Parallel Data Storage Workshop","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128428713","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The performance gap between processor and storage device has continuously increased during the past few decades. The gap is further exacerbated recently because applications are becoming more data-intensive in both industry and academia. Traditional storage devices, such as hard disk drives (HDD), fail to keep up with the paces of this growth. A known solution is to use solid state drives (SSD) as fast storage. Due to high cost of SSD, data and supercomputing centers usually adopt a hybrid storage system, which consists of a combination of HDD and SSD I/O servers. However, hybrid I/O and storage systems have increased the complexity, making SSD often underutilized. The configuration and utilization of HDD/SSD hybrid systems is a lasting phenomenon. In this study, we propose a high performance hybrid parallel I/O and storage simulator, HPIS3. As a co-design tool, HPIS3 is capable of simulating a variety of parallel storage systems, especially under hybrid scenarios. The experimental results show that the lowest error rate is 2%, and the average is 11.98%.
{"title":"HPIS3: Towards a High-Performance Simulator for Hybrid Parallel I/O and Storage Systems","authors":"Bo Feng, Ning Liu, Shuibing He, Xian-He Sun","doi":"10.1109/PDSW.2014.12","DOIUrl":"https://doi.org/10.1109/PDSW.2014.12","url":null,"abstract":"The performance gap between processor and storage device has continuously increased during the past few decades. The gap is further exacerbated recently because applications are becoming more data-intensive in both industry and academia. Traditional storage devices, such as hard disk drives (HDD), fail to keep up with the paces of this growth. A known solution is to use solid state drives (SSD) as fast storage. Due to high cost of SSD, data and supercomputing centers usually adopt a hybrid storage system, which consists of a combination of HDD and SSD I/O servers. However, hybrid I/O and storage systems have increased the complexity, making SSD often underutilized. The configuration and utilization of HDD/SSD hybrid systems is a lasting phenomenon. In this study, we propose a high performance hybrid parallel I/O and storage simulator, HPIS3. As a co-design tool, HPIS3 is capable of simulating a variety of parallel storage systems, especially under hybrid scenarios. The experimental results show that the lowest error rate is 2%, and the average is 11.98%.","PeriodicalId":151633,"journal":{"name":"2014 9th Parallel Data Storage Workshop","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127037718","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Parallel file systems are often characterized by a layered architecture that decouples metadata management from I/O operations, allowing file systems to facilitate fast concurrent access to file contents. However, metadata intensive workloads are still likely to bottleneck at the file system control plane due to namespace synchronization, which taxes application performance through lock contention on directories, transaction serialization, and RPC overheads. In this paper, we propose a client-driven file system metadata architecture, BatchFS, that is optimized for noninteractive, or batch, workloads. To avoid metadata bottlenecks, BatchFS features a relaxed consistency model marked by lazy namespace synchronization and optimistic metadata verification. Capable of executing namespace operations on client-provisioned resources without contacting any metadata server, BatchFS clients are able to delay namespace synchronization until synchronization is really needed. Our goal in this vision paper is to handle these delayed operations securely and efficiently with metadata verification and bulk insertion. Preliminary experiments demonstrate that our client-funded metadata architecture outperforms a traditional synchronous file system by orders of magnitude.
{"title":"BatchFS: Scaling the File System Control Plane with Client-Funded Metadata Servers","authors":"Qing Zheng, Kai Ren, Garth A. Gibson","doi":"10.1109/PDSW.2014.7","DOIUrl":"https://doi.org/10.1109/PDSW.2014.7","url":null,"abstract":"Parallel file systems are often characterized by a layered architecture that decouples metadata management from I/O operations, allowing file systems to facilitate fast concurrent access to file contents. However, metadata intensive workloads are still likely to bottleneck at the file system control plane due to namespace synchronization, which taxes application performance through lock contention on directories, transaction serialization, and RPC overheads. In this paper, we propose a client-driven file system metadata architecture, BatchFS, that is optimized for noninteractive, or batch, workloads. To avoid metadata bottlenecks, BatchFS features a relaxed consistency model marked by lazy namespace synchronization and optimistic metadata verification. Capable of executing namespace operations on client-provisioned resources without contacting any metadata server, BatchFS clients are able to delay namespace synchronization until synchronization is really needed. Our goal in this vision paper is to handle these delayed operations securely and efficiently with metadata verification and bulk insertion. Preliminary experiments demonstrate that our client-funded metadata architecture outperforms a traditional synchronous file system by orders of magnitude.","PeriodicalId":151633,"journal":{"name":"2014 9th Parallel Data Storage Workshop","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125953950","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}