首页 > 最新文献

2020 IEEE/ACM Fifth International Parallel Data Systems Workshop (PDSW)最新文献

英文 中文
GPU Direct I/O with HDF5 GPU直接I/O与HDF5
Pub Date : 2020-11-01 DOI: 10.1109/PDSW51947.2020.00010
J. Ravi, S. Byna, Q. Koziol
Exascale HPC systems are being designed with accelerators, such as GPUs, to accelerate parts of applications. In machine learning workloads as well as large-scale simulations that use GPUs as accelerators, the CPU (or host) memory is currently used as a buffer for data transfers between GPU (or device) memory and the file system. If the CPU does not need to operate on the data, then this is sub-optimal because it wastes host memory by reserving space for duplicated data. Furthermore, this “bounce buffer” approach wastes CPU cycles spent on transferring data. A new technique, NVIDIA GPUDirect Storage (GDS), can eliminate the need to use the host memory as a bounce buffer. Thereby, it becomes possible to transfer data directly between the device memory and the file system. This direct data path shortens latency by omitting the extra copy and enables higher-bandwidth. To take full advantage of GDS in existing applications, it is necessary to provide support with existing I/O libraries, such as HDF5 and MPI-IO, which are heavily used in applications. In this paper, we describe our effort of integrating GDS with HDF5, the top I/O library at NERSC and at DOE leadership computing facilities. We design and implement this integration using a HDF5 Virtual File Driver (VFD). The GDS VFD provides a file system abstraction to the application that allows HDF5 applications to perform I/O without the need to move data between CPUs and GPUs explicitly. We compare performance of the HDF5 GDS VFD with explicit data movement approaches and demonstrate superior performance with the GDS method.
Exascale HPC系统被设计为带有加速器,例如gpu,以加速部分应用程序。在机器学习工作负载以及使用GPU作为加速器的大规模模拟中,CPU(或主机)内存目前用作GPU(或设备)内存和文件系统之间数据传输的缓冲区。如果CPU不需要对数据进行操作,那么这是次优的,因为它会为重复的数据保留空间,从而浪费主机内存。此外,这种“反弹缓冲”方法浪费了用于传输数据的CPU周期。一项新技术,NVIDIA GPUDirect Storage (GDS),可以消除使用主机内存作为弹跳缓冲器的需要。因此,可以在设备存储器和文件系统之间直接传输数据。这种直接的数据路径通过省略额外的拷贝来缩短延迟,并支持更高的带宽。为了在现有的应用程序中充分利用GDS,有必要为现有的I/O库提供支持,例如HDF5和MPI-IO,它们在应用程序中被大量使用。在本文中,我们描述了我们将GDS与HDF5 (NERSC和DOE领导计算设施的顶级I/O库)集成的努力。我们使用HDF5虚拟文件驱动程序(VFD)设计和实现这种集成。GDS VFD为应用程序提供了一个文件系统抽象,允许HDF5应用程序执行I/O,而不需要在cpu和gpu之间显式地移动数据。我们比较了HDF5 GDS VFD与显式数据移动方法的性能,并证明了GDS方法的优越性能。
{"title":"GPU Direct I/O with HDF5","authors":"J. Ravi, S. Byna, Q. Koziol","doi":"10.1109/PDSW51947.2020.00010","DOIUrl":"https://doi.org/10.1109/PDSW51947.2020.00010","url":null,"abstract":"Exascale HPC systems are being designed with accelerators, such as GPUs, to accelerate parts of applications. In machine learning workloads as well as large-scale simulations that use GPUs as accelerators, the CPU (or host) memory is currently used as a buffer for data transfers between GPU (or device) memory and the file system. If the CPU does not need to operate on the data, then this is sub-optimal because it wastes host memory by reserving space for duplicated data. Furthermore, this “bounce buffer” approach wastes CPU cycles spent on transferring data. A new technique, NVIDIA GPUDirect Storage (GDS), can eliminate the need to use the host memory as a bounce buffer. Thereby, it becomes possible to transfer data directly between the device memory and the file system. This direct data path shortens latency by omitting the extra copy and enables higher-bandwidth. To take full advantage of GDS in existing applications, it is necessary to provide support with existing I/O libraries, such as HDF5 and MPI-IO, which are heavily used in applications. In this paper, we describe our effort of integrating GDS with HDF5, the top I/O library at NERSC and at DOE leadership computing facilities. We design and implement this integration using a HDF5 Virtual File Driver (VFD). The GDS VFD provides a file system abstraction to the application that allows HDF5 applications to perform I/O without the need to move data between CPUs and GPUs explicitly. We compare performance of the HDF5 GDS VFD with explicit data movement approaches and demonstrate superior performance with the GDS method.","PeriodicalId":142923,"journal":{"name":"2020 IEEE/ACM Fifth International Parallel Data Systems Workshop (PDSW)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114084700","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Emulating I/O Behavior in Scientific Workflows on High Performance Computing Systems 高性能计算系统中科学工作流的I/O行为仿真
Pub Date : 2020-11-01 DOI: 10.1109/PDSW51947.2020.00011
Fahim Chowdhury, Yue Zhu, F. Natale, A. Moody, Elsa Gonsiorowski, K. Mohror, Weikuan Yu
Scientific application workflows leverage the capabilities of cutting-edge high-performance computing (HPC) facilities to enable complex applications for academia, research, and industry communities. Data transfer and I/O dependency among different modules of modern HPC workflows can increase the complexity and hamper the overall performance of workflows. Understanding this complexity due to data-dependency and dataflow is an essential prerequisite for developing optimization strategies to improve I/O performance and, eventually, the entire workflow. In this paper, we discuss dataflow patterns for workflow applications on HPC systems. As existing I/O benchmarking tools lack in identifying and representing the dataflow in modern HPC workflows, we have implemented Wemul, an open-source workflow I/O emulation framework, to mimic different types of I/O behavior demonstrated by common and complex HPC application workflows for deeper analysis. We elaborate on the features and usage of Wemul, demonstrate its application to HPC workflows, and discuss the insights from the performance analysis results on Lassen supercomputing cluster at Lawrence Livermore National Laboratory (LLNL).
科学应用程序工作流利用尖端高性能计算(HPC)设施的功能,为学术界、研究机构和工业界提供复杂的应用程序。现代高性能计算工作流中不同模块之间的数据传输和I/O依赖会增加工作流的复杂性,影响工作流的整体性能。理解由于数据依赖性和数据流导致的这种复杂性是开发优化策略以提高I/O性能并最终提高整个工作流的必要先决条件。本文讨论了HPC系统中工作流应用的数据流模式。由于现有的I/O基准测试工具在识别和表示现代HPC工作流中的数据流方面存在不足,我们实现了Wemul,一个开源的工作流I/O仿真框架,以模拟常见和复杂HPC应用程序工作流中不同类型的I/O行为,以进行更深入的分析。我们详细介绍了Wemul的特性和用法,演示了它在HPC工作流中的应用,并讨论了来自劳伦斯利弗莫尔国家实验室(LLNL) Lassen超级计算集群性能分析结果的见解。
{"title":"Emulating I/O Behavior in Scientific Workflows on High Performance Computing Systems","authors":"Fahim Chowdhury, Yue Zhu, F. Natale, A. Moody, Elsa Gonsiorowski, K. Mohror, Weikuan Yu","doi":"10.1109/PDSW51947.2020.00011","DOIUrl":"https://doi.org/10.1109/PDSW51947.2020.00011","url":null,"abstract":"Scientific application workflows leverage the capabilities of cutting-edge high-performance computing (HPC) facilities to enable complex applications for academia, research, and industry communities. Data transfer and I/O dependency among different modules of modern HPC workflows can increase the complexity and hamper the overall performance of workflows. Understanding this complexity due to data-dependency and dataflow is an essential prerequisite for developing optimization strategies to improve I/O performance and, eventually, the entire workflow. In this paper, we discuss dataflow patterns for workflow applications on HPC systems. As existing I/O benchmarking tools lack in identifying and representing the dataflow in modern HPC workflows, we have implemented Wemul, an open-source workflow I/O emulation framework, to mimic different types of I/O behavior demonstrated by common and complex HPC application workflows for deeper analysis. We elaborate on the features and usage of Wemul, demonstrate its application to HPC workflows, and discuss the insights from the performance analysis results on Lassen supercomputing cluster at Lawrence Livermore National Laboratory (LLNL).","PeriodicalId":142923,"journal":{"name":"2020 IEEE/ACM Fifth International Parallel Data Systems Workshop (PDSW)","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125261232","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
[Title page] (标题页)
Pub Date : 2020-11-01 DOI: 10.1109/pdsw51947.2020.00001
{"title":"[Title page]","authors":"","doi":"10.1109/pdsw51947.2020.00001","DOIUrl":"https://doi.org/10.1109/pdsw51947.2020.00001","url":null,"abstract":"","PeriodicalId":142923,"journal":{"name":"2020 IEEE/ACM Fifth International Parallel Data Systems Workshop (PDSW)","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121530435","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Gauge: An Interactive Data-Driven Visualization Tool for HPC Application I/O Performance Analysis Gauge:用于HPC应用程序I/O性能分析的交互式数据驱动可视化工具
Pub Date : 2020-11-01 DOI: 10.1109/PDSW51947.2020.00008
Eliakin del Rosario, Mikaela Currier, Mihailo Isakov, S. Madireddy, Prasanna Balaprakash, P. Carns, R. Ross, K. Harms, S. Snyder, M. Kinsy
Understanding and alleviating I/O bottlenecks in HPC system workloads is difficult due to the complex, multilayered nature of HPC I/O subsystems. Even with full visibility into the jobs executed on the system, the lack of tooling makes debugging I/O problems difficult. In this work, we introduce Gauge, an interactive, data-driven, web-based visualization tool for HPC I/O performance analysis. Gauge aids in the process of visualizing and analyzing, in an interactive fashion, large sets of HPC application execution logs. It performs a number of functions met to significantly reduce the cognitive load of navigating these sets - some worth many years of HPC logs. For instance, as its first step in many processing chains, it arranges unordered sets of collected HPC logs into a hierarchy of clusters for later analysis. This clustering step allows application developers to quickly navigate logs, find how their jobs compare to those of their peers in terms of I/O utilization, as well as how to improve their future runs. Similarly, facility operators can use Gauge to ‘get a pulse’ on the workloads running on their HPC systems, find clusters of under performing applications, and diagnose the reason for poor I/O throughput. In this work, we describe how Gauge arrives at the HPC jobs clustering, how it presents data about the jobs, and how it can be used to further narrow down and understand behavior of sets of jobs. We also provide a case study on using Gauge from the perspective of a facility operator.
由于HPC I/O子系统的复杂性和多层性,理解和缓解HPC系统工作负载中的I/O瓶颈是很困难的。即使完全了解系统上执行的作业,工具的缺乏也会使调试I/O问题变得困难。在这项工作中,我们介绍了Gauge,一个用于HPC I/O性能分析的交互式、数据驱动的、基于web的可视化工具。Gauge有助于以交互方式可视化和分析大量HPC应用程序执行日志。它执行了许多功能,以显著减少导航这些集合的认知负荷,其中一些功能值得多年的HPC日志。例如,作为许多处理链中的第一步,它将收集到的HPC日志的无序集安排到集群的层次结构中,以供以后分析。这个集群步骤允许应用程序开发人员快速浏览日志,查找他们的作业在I/O利用率方面与同类作业的比较,以及如何改进未来的运行。类似地,设施运营商可以使用Gauge来“了解”运行在HPC系统上的工作负载,找到性能不佳的应用程序集群,并诊断I/O吞吐量差的原因。在这项工作中,我们描述了Gauge如何到达HPC作业集群,它如何呈现有关作业的数据,以及如何使用它进一步缩小和理解作业集的行为。我们还提供了一个从设施运营商的角度使用Gauge的案例研究。
{"title":"Gauge: An Interactive Data-Driven Visualization Tool for HPC Application I/O Performance Analysis","authors":"Eliakin del Rosario, Mikaela Currier, Mihailo Isakov, S. Madireddy, Prasanna Balaprakash, P. Carns, R. Ross, K. Harms, S. Snyder, M. Kinsy","doi":"10.1109/PDSW51947.2020.00008","DOIUrl":"https://doi.org/10.1109/PDSW51947.2020.00008","url":null,"abstract":"Understanding and alleviating I/O bottlenecks in HPC system workloads is difficult due to the complex, multilayered nature of HPC I/O subsystems. Even with full visibility into the jobs executed on the system, the lack of tooling makes debugging I/O problems difficult. In this work, we introduce Gauge, an interactive, data-driven, web-based visualization tool for HPC I/O performance analysis. Gauge aids in the process of visualizing and analyzing, in an interactive fashion, large sets of HPC application execution logs. It performs a number of functions met to significantly reduce the cognitive load of navigating these sets - some worth many years of HPC logs. For instance, as its first step in many processing chains, it arranges unordered sets of collected HPC logs into a hierarchy of clusters for later analysis. This clustering step allows application developers to quickly navigate logs, find how their jobs compare to those of their peers in terms of I/O utilization, as well as how to improve their future runs. Similarly, facility operators can use Gauge to ‘get a pulse’ on the workloads running on their HPC systems, find clusters of under performing applications, and diagnose the reason for poor I/O throughput. In this work, we describe how Gauge arrives at the HPC jobs clustering, how it presents data about the jobs, and how it can be used to further narrow down and understand behavior of sets of jobs. We also provide a case study on using Gauge from the perspective of a facility operator.","PeriodicalId":142923,"journal":{"name":"2020 IEEE/ACM Fifth International Parallel Data Systems Workshop (PDSW)","volume":"35 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130581920","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Fingerprinting the Checker Policies of Parallel File Systems 对并行文件系统的检查策略进行指纹识别
Pub Date : 2020-11-01 DOI: 10.1109/PDSW51947.2020.00013
Runzhou Han, Duo Zhang, Mai Zheng
Parallel file systems (PFSes) play an essential role in high performance computing. To ensure the integrity, many PFSes are designed with a checker component, which serves as the last line of defense to bring a corrupted PFS back to a healthy state. Motivated by real-world incidents of PFS corruptions, we perform a fine-grained study on the capability of PFS checkers in this paper. We apply type-aware fault injection to specific PFS structures, and examine the detection and repair policies of PFS checkers meticulously via a well-defined taxonomy. The study results on two representative PFS checkers show that they are able to handle a wide range of corruptions on important data structures. On the other hand, neither of them is perfect: there are multiple cases where the checkers may behave sub-optimally, leading to kernel panics, wrong repairs, etc. Our work has led to a new patch on Lustre. We hope to develop our methodology into a generic framework for analyzing the checkers of diverse PFSes, and enable more elegant designs of PFS checkers for reliable high-performance computing.
并行文件系统(pfse)在高性能计算中扮演着重要的角色。为了确保完整性,许多PFS都设计了检查器组件,这是将损坏的PFS恢复到健康状态的最后一道防线。受现实世界中PFS损坏事件的启发,我们在本文中对PFS检查器的能力进行了细粒度的研究。我们将类型感知故障注入应用于特定的PFS结构,并通过定义良好的分类法仔细检查PFS检查器的检测和修复策略。在两个具有代表性的PFS检查器上的研究结果表明,它们能够处理重要数据结构上的大范围损坏。另一方面,它们都不是完美的:在很多情况下,检查器的行为可能不是最优的,从而导致内核恐慌、错误修复等。我们的工作为光泽带来了新的补丁。我们希望将我们的方法发展成为分析各种PFS检查器的通用框架,并为可靠的高性能计算提供更优雅的PFS检查器设计。
{"title":"Fingerprinting the Checker Policies of Parallel File Systems","authors":"Runzhou Han, Duo Zhang, Mai Zheng","doi":"10.1109/PDSW51947.2020.00013","DOIUrl":"https://doi.org/10.1109/PDSW51947.2020.00013","url":null,"abstract":"Parallel file systems (PFSes) play an essential role in high performance computing. To ensure the integrity, many PFSes are designed with a checker component, which serves as the last line of defense to bring a corrupted PFS back to a healthy state. Motivated by real-world incidents of PFS corruptions, we perform a fine-grained study on the capability of PFS checkers in this paper. We apply type-aware fault injection to specific PFS structures, and examine the detection and repair policies of PFS checkers meticulously via a well-defined taxonomy. The study results on two representative PFS checkers show that they are able to handle a wide range of corruptions on important data structures. On the other hand, neither of them is perfect: there are multiple cases where the checkers may behave sub-optimally, leading to kernel panics, wrong repairs, etc. Our work has led to a new patch on Lustre. We hope to develop our methodology into a generic framework for analyzing the checkers of diverse PFSes, and enable more elegant designs of PFS checkers for reliable high-performance computing.","PeriodicalId":142923,"journal":{"name":"2020 IEEE/ACM Fifth International Parallel Data Systems Workshop (PDSW)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117056597","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Fractional-Overlap Declustered Parity: Evaluating Reliability for Storage Systems 分数重叠聚类奇偶校验:存储系统可靠性评估
Pub Date : 2020-11-01 DOI: 10.1109/PDSW51947.2020.00009
Huan Ke, Haryadi S. Gunawi, Dominic Manno, David Bonnie, B. Settlemyer
In this paper, we propose a flexible and practical data protection scheme, fractional-overlap declustered parity (FODP), to explore the trade-offs between fault tolerance and rebuild performance. Our experiments show that FODP is able to bring forth up to 99% less probability of data loss in the presence of various failure regimes. Furthermore, by adding one additional spare drive capacity within each server, FODP yields up to 99% reduction in granularity of data loss.
在本文中,我们提出了一种灵活实用的数据保护方案,分数重叠聚类奇偶校验(FODP),以探索容错性和重建性能之间的权衡。我们的实验表明,在存在各种失效机制的情况下,FODP能够使数据丢失的概率降低99%。此外,通过在每个服务器中添加一个额外的备用驱动器容量,FODP可以将数据丢失的粒度减少99%。
{"title":"Fractional-Overlap Declustered Parity: Evaluating Reliability for Storage Systems","authors":"Huan Ke, Haryadi S. Gunawi, Dominic Manno, David Bonnie, B. Settlemyer","doi":"10.1109/PDSW51947.2020.00009","DOIUrl":"https://doi.org/10.1109/PDSW51947.2020.00009","url":null,"abstract":"In this paper, we propose a flexible and practical data protection scheme, fractional-overlap declustered parity (FODP), to explore the trade-offs between fault tolerance and rebuild performance. Our experiments show that FODP is able to bring forth up to 99% less probability of data loss in the presence of various failure regimes. Furthermore, by adding one additional spare drive capacity within each server, FODP yields up to 99% reduction in granularity of data loss.","PeriodicalId":142923,"journal":{"name":"2020 IEEE/ACM Fifth International Parallel Data Systems Workshop (PDSW)","volume":"74 S331","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113954219","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Keeping It Real: Why HPC Data Services Don't Achieve I/O Microbenchmark Performance 保持真实:为什么HPC数据服务不能达到I/O微基准性能
Pub Date : 2020-11-01 DOI: 10.1109/PDSW51947.2020.00006
P. Carns, K. Harms, B. Settlemyer, Brian Atkinson, R. Ross
HPC storage software developers rely on benchmarks as reference points for performance evaluation. Low-level synthetic microbenchmarks are particularly valuable for isolating performance bottlenecks in complex systems and identifying optimization opportunities. The use of low-level microbenchmarks also entails risk, however, especially if the benchmark behavior does not reflect the nuances of production data services or applications. In those cases, microbenchmark measurements can lead to unrealistic expectations or misdiagnosis of performance problems. Neither benchmark creators nor software developers are necessarily at fault in this scenario, however. The underlying problem is more often a subtle disconnect between the objective of the benchmark and the objective of the developer. In this paper we investigate examples of discrepancies between microbenchmark behavior and software developer expectations. Our goal is to draw attention to these pitfalls and initiate a discussion within the community about how to improve the state of the practice in performance engineering for HPC data services.
HPC存储软件开发人员依赖基准作为性能评估的参考点。低级合成微基准测试对于隔离复杂系统中的性能瓶颈和确定优化机会特别有价值。但是,使用低级微基准也会带来风险,特别是当基准行为不能反映生产数据服务或应用程序的细微差别时。在这些情况下,微基准测量可能导致不切实际的期望或对性能问题的误诊。然而,在这种情况下,基准创建者和软件开发人员都不一定有错。潜在的问题往往是基准测试的目标和开发人员的目标之间的微妙脱节。在本文中,我们调查了微基准行为和软件开发人员期望之间差异的例子。我们的目标是引起人们对这些缺陷的关注,并在社区内发起一场关于如何改进高性能计算数据服务性能工程实践状态的讨论。
{"title":"Keeping It Real: Why HPC Data Services Don't Achieve I/O Microbenchmark Performance","authors":"P. Carns, K. Harms, B. Settlemyer, Brian Atkinson, R. Ross","doi":"10.1109/PDSW51947.2020.00006","DOIUrl":"https://doi.org/10.1109/PDSW51947.2020.00006","url":null,"abstract":"HPC storage software developers rely on benchmarks as reference points for performance evaluation. Low-level synthetic microbenchmarks are particularly valuable for isolating performance bottlenecks in complex systems and identifying optimization opportunities. The use of low-level microbenchmarks also entails risk, however, especially if the benchmark behavior does not reflect the nuances of production data services or applications. In those cases, microbenchmark measurements can lead to unrealistic expectations or misdiagnosis of performance problems. Neither benchmark creators nor software developers are necessarily at fault in this scenario, however. The underlying problem is more often a subtle disconnect between the objective of the benchmark and the objective of the developer. In this paper we investigate examples of discrepancies between microbenchmark behavior and software developer expectations. Our goal is to draw attention to these pitfalls and initiate a discussion within the community about how to improve the state of the practice in performance engineering for HPC data services.","PeriodicalId":142923,"journal":{"name":"2020 IEEE/ACM Fifth International Parallel Data Systems Workshop (PDSW)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115576597","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Towards On-Demand I/O Forwarding in HPC Platforms 面向HPC平台的按需I/O转发
Pub Date : 2020-11-01 DOI: 10.1109/PDSW51947.2020.00007
J. L. Bez, F. Boito, Alberto Miranda, Ramon Nou, Toni Cortes, P. Navaux
I/O forwarding is an established and widely-adopted technique in HPC to reduce contention and improve I/O performance in the access to shared storage infrastructure. On such machines, this layer is often physically deployed on dedicated nodes, and their connection to the clients is static. Furthermore, the increasingly heterogeneous workloads entering HPC installations stress the I/O stack, requiring tuning and reconfiguration based on the applications' characteristics. Nonetheless, it is not always feasible in a production system to explore the potential benefits of this layer under different configurations without impacting clients. In this paper, we investigate the effects of I/O forwarding on performance by considering the application's I/O access patterns and system characteristics. We aim to explore when forwarding is the best choice for an application, how many I/O nodes it would benefit from, and whether not using forwarding at all might be the correct decision. To gather performance metrics, explore, and understand the impact of forwarding I/O requests of different access patterns, we implemented FORGE, a lightweight I/O forwarding layer in user-space. Using FORGE, we evaluated the optimal forwarding configurations for several access patterns on MareNostrum 4 (Spain) and Santos Dumont (Brazil) supercomputers. Our results demonstrate that shifting the focus from a static system-wide deployment to an on-demand reconfigurable I/O forwarding layer dictated by application demands can improve I/O performance on future machines.
I/O转发是HPC中广泛采用的一种技术,用于在访问共享存储基础设施时减少争用并提高I/O性能。在这些机器上,这一层通常物理地部署在专用节点上,并且它们与客户机的连接是静态的。此外,进入HPC安装的越来越多的异构工作负载给I/O堆栈带来压力,需要根据应用程序的特征进行调优和重新配置。尽管如此,在不影响客户机的情况下,在生产系统中探索该层在不同配置下的潜在好处并不总是可行的。在本文中,我们通过考虑应用程序的I/O访问模式和系统特性来研究I/O转发对性能的影响。我们的目标是探索何时转发是应用程序的最佳选择,它将受益于多少I/O节点,以及根本不使用转发是否可能是正确的决定。为了收集性能指标,探索和理解转发不同访问模式的I/O请求的影响,我们实现了FORGE,这是用户空间中的轻量级I/O转发层。使用FORGE,我们评估了MareNostrum 4(西班牙)和Santos Dumont(巴西)超级计算机上几种访问模式的最佳转发配置。我们的结果表明,将重点从静态的系统范围部署转移到由应用程序需求决定的按需可重构I/O转发层,可以提高未来机器上的I/O性能。
{"title":"Towards On-Demand I/O Forwarding in HPC Platforms","authors":"J. L. Bez, F. Boito, Alberto Miranda, Ramon Nou, Toni Cortes, P. Navaux","doi":"10.1109/PDSW51947.2020.00007","DOIUrl":"https://doi.org/10.1109/PDSW51947.2020.00007","url":null,"abstract":"I/O forwarding is an established and widely-adopted technique in HPC to reduce contention and improve I/O performance in the access to shared storage infrastructure. On such machines, this layer is often physically deployed on dedicated nodes, and their connection to the clients is static. Furthermore, the increasingly heterogeneous workloads entering HPC installations stress the I/O stack, requiring tuning and reconfiguration based on the applications' characteristics. Nonetheless, it is not always feasible in a production system to explore the potential benefits of this layer under different configurations without impacting clients. In this paper, we investigate the effects of I/O forwarding on performance by considering the application's I/O access patterns and system characteristics. We aim to explore when forwarding is the best choice for an application, how many I/O nodes it would benefit from, and whether not using forwarding at all might be the correct decision. To gather performance metrics, explore, and understand the impact of forwarding I/O requests of different access patterns, we implemented FORGE, a lightweight I/O forwarding layer in user-space. Using FORGE, we evaluated the optimal forwarding configurations for several access patterns on MareNostrum 4 (Spain) and Santos Dumont (Brazil) supercomputers. Our results demonstrate that shifting the focus from a static system-wide deployment to an on-demand reconfigurable I/O forwarding layer dictated by application demands can improve I/O performance on future machines.","PeriodicalId":142923,"journal":{"name":"2020 IEEE/ACM Fifth International Parallel Data Systems Workshop (PDSW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128691881","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Pangeo Benchmarking Analysis: Object Storage vs. POSIX File System Pangeo基准分析:对象存储与POSIX文件系统
Pub Date : 2020-10-21 DOI: 10.1109/PDSW51947.2020.00012
Haiying Xu, Kevin Paul, Anderson Banihirwe
Pangeo is a community of scientists and software developers collaborating to enable Big Data Geoscience analysis interactively in the public cloud and on high-performance computing (HPC) systems. At the core of the Pangeo software stack is (1) Xarray, which adds labels to metadata such as dimensions, coordinates and attributes for raw array-oriented data, (2) Dask, which provides parallel computation and out-of-core memory capabilities, and (3) Jupyter Lab which offers the web-based interactive environment to the Pangeo platform. Geoscientists now have a strong candidate software stack to analyze large datasets, and they are very curious about performance differences between the Zarr and NetCDF4 data formats on both traditional file storage systems and object storage. We have written a benchmarking suite for the Pangeo stack that can measure scalability and performance information of both input/output (I/O) throughput and computation. We will describe how we performed these benchmarks, analyzed our results, and we will discuss the pros and cons of the Pangeo software stack in terms of I/O scalability on both cloud and HPC storage systems.
Pangeo是一个由科学家和软件开发人员组成的社区,致力于在公共云和高性能计算(HPC)系统上实现大数据地球科学的交互式分析。Pangeo软件栈的核心是(1)Xarray,它为原始面向数组的数据添加维度、坐标和属性等元数据标签;(2)Dask,它提供并行计算和核外内存能力;(3)Jupyter Lab,它为Pangeo平台提供基于web的交互环境。地球科学家现在有一个强大的候选软件堆栈来分析大型数据集,他们非常好奇Zarr和NetCDF4数据格式在传统文件存储系统和对象存储上的性能差异。我们为Pangeo堆栈编写了一个基准测试套件,它可以测量输入/输出(I/O)吞吐量和计算的可伸缩性和性能信息。我们将描述如何执行这些基准测试,分析结果,并讨论Pangeo软件堆栈在云和HPC存储系统上的I/O可伸缩性方面的优缺点。
{"title":"Pangeo Benchmarking Analysis: Object Storage vs. POSIX File System","authors":"Haiying Xu, Kevin Paul, Anderson Banihirwe","doi":"10.1109/PDSW51947.2020.00012","DOIUrl":"https://doi.org/10.1109/PDSW51947.2020.00012","url":null,"abstract":"Pangeo is a community of scientists and software developers collaborating to enable Big Data Geoscience analysis interactively in the public cloud and on high-performance computing (HPC) systems. At the core of the Pangeo software stack is (1) Xarray, which adds labels to metadata such as dimensions, coordinates and attributes for raw array-oriented data, (2) Dask, which provides parallel computation and out-of-core memory capabilities, and (3) Jupyter Lab which offers the web-based interactive environment to the Pangeo platform. Geoscientists now have a strong candidate software stack to analyze large datasets, and they are very curious about performance differences between the Zarr and NetCDF4 data formats on both traditional file storage systems and object storage. We have written a benchmarking suite for the Pangeo stack that can measure scalability and performance information of both input/output (I/O) throughput and computation. We will describe how we performed these benchmarks, analyzed our results, and we will discuss the pros and cons of the Pangeo software stack in terms of I/O scalability on both cloud and HPC storage systems.","PeriodicalId":142923,"journal":{"name":"2020 IEEE/ACM Fifth International Parallel Data Systems Workshop (PDSW)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115496361","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Message from the Workshop Chairs 来自研讨会主席的信息
Pub Date : 2009-10-01 DOI: 10.1109/VIZSEC.2009.5375531
D. Frincke, C. Gates, J. Goodall
Welcome to VizSec 2009! The 6th International Workshop on Visualization for Cyber Security continues to provide a forum bringing researchers and practitioners in information visualization and security together to address the specific needs of the cyber security community through new and insightful visualization techniques. VizSec 2009 continues the established practice of alternating our meeting between research conferences focused on cybersecurity, and researchers focused on analytics. This provides a balance between “Viz” (visualization and analytics) and “Sec” (cybersecurity). This balance is important — as is the balance between practitioner goals and the interests of the long term researcher. While the immediate needs within the cybersecurity community are great, and visualization can provide much needed support, a focus only on the immediate analytical crisis will at best provide short bursts of improvement. Longer term research is also necessary, especially long term research that is undertaken with an eye towards improving the lot of the intended user. It is here that VizSec fills an important and unique niche.
欢迎来到VizSec 2009!第六届网络安全可视化国际研讨会继续提供一个论坛,将信息可视化和安全的研究人员和从业者聚集在一起,通过新的和富有洞察力的可视化技术来解决网络安全社区的特定需求。VizSec 2009延续了我们在网络安全研究会议和分析研究会议之间交替举行会议的惯例。这提供了“Viz”(可视化和分析)和“Sec”(网络安全)之间的平衡。这种平衡很重要——就像从业者的目标和长期研究者的利益之间的平衡一样。虽然网络安全社区的迫切需求是巨大的,可视化可以提供急需的支持,但只关注当前的分析危机最多只能提供短暂的改进。长期研究也是必要的,特别是着眼于改善预期用户的长期研究。正是在这里,VizSec填补了一个重要而独特的利基市场。
{"title":"Message from the Workshop Chairs","authors":"D. Frincke, C. Gates, J. Goodall","doi":"10.1109/VIZSEC.2009.5375531","DOIUrl":"https://doi.org/10.1109/VIZSEC.2009.5375531","url":null,"abstract":"Welcome to VizSec 2009! The 6th International Workshop on Visualization for Cyber Security continues to provide a forum bringing researchers and practitioners in information visualization and security together to address the specific needs of the cyber security community through new and insightful visualization techniques. VizSec 2009 continues the established practice of alternating our meeting between research conferences focused on cybersecurity, and researchers focused on analytics. This provides a balance between “Viz” (visualization and analytics) and “Sec” (cybersecurity). This balance is important — as is the balance between practitioner goals and the interests of the long term researcher. While the immediate needs within the cybersecurity community are great, and visualization can provide much needed support, a focus only on the immediate analytical crisis will at best provide short bursts of improvement. Longer term research is also necessary, especially long term research that is undertaken with an eye towards improving the lot of the intended user. It is here that VizSec fills an important and unique niche.","PeriodicalId":142923,"journal":{"name":"2020 IEEE/ACM Fifth International Parallel Data Systems Workshop (PDSW)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127404150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2020 IEEE/ACM Fifth International Parallel Data Systems Workshop (PDSW)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1