首页 > 最新文献

International Journal of High Performance Computing Applications最新文献

英文 中文
Black-box statistical prediction of lossy compression ratios for scientific data 科学数据有损压缩比的黑箱统计预测
IF 3.1 3区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2023-05-15 DOI: 10.1177/10943420231179417
Robert Underwood, J. Bessac, David Krasowska, Jon C. Calhoun, S. Di, F. Cappello
Lossy compressors are increasingly adopted in scientific research, tackling volumes of data from experiments or parallel numerical simulations and facilitating data storage and movement. In contrast with the notion of entropy in lossless compression, no theoretical or data-based quantification of lossy compressibility exists for scientific data. Users rely on trial and error to assess lossy compression performance. As a strong data-driven effort toward quantifying lossy compressibility of scientific datasets, we provide a statistical framework to predict compression ratios of lossy compressors. Our method is a two-step framework where (i) compressor-agnostic predictors are computed and (ii) statistical prediction models relying on these predictors are trained on observed compression ratios. Proposed predictors exploit spatial correlations and notions of entropy and lossyness via the quantized entropy. We study 8+ compressors on 6 scientific datasets and achieve a median percentage prediction error less than 12%, which is substantially smaller than that of other methods while achieving at least a 8.8× speedup for searching for a specific compression ratio and 7.8× speedup for determining the best compressor out of a collection.
在科学研究中越来越多地采用有损压缩器,处理来自实验或并行数值模拟的大量数据,并促进数据存储和移动。与无损压缩中的熵的概念相反,科学数据的有损压缩性没有理论的或基于数据的量化。用户依靠试错来评估有损压缩性能。作为一项强大的数据驱动的量化科学数据集的有损压缩性的努力,我们提供了一个统计框架来预测有损压缩器的压缩比。我们的方法是一个两步框架,其中(i)计算与压缩机无关的预测因子,(ii)根据观察到的压缩比训练依赖于这些预测因子的统计预测模型。提出的预测器利用空间相关性以及熵和损耗的概念,通过量化熵。我们在6个科学数据集上研究了8个以上的压缩机,并实现了小于12%的中位数百分比预测误差,这比其他方法的预测误差要小得多,同时在搜索特定压缩比时实现了至少8.8倍的加速,在从集合中确定最佳压缩机时实现了7.8倍的加速。
{"title":"Black-box statistical prediction of lossy compression ratios for scientific data","authors":"Robert Underwood, J. Bessac, David Krasowska, Jon C. Calhoun, S. Di, F. Cappello","doi":"10.1177/10943420231179417","DOIUrl":"https://doi.org/10.1177/10943420231179417","url":null,"abstract":"Lossy compressors are increasingly adopted in scientific research, tackling volumes of data from experiments or parallel numerical simulations and facilitating data storage and movement. In contrast with the notion of entropy in lossless compression, no theoretical or data-based quantification of lossy compressibility exists for scientific data. Users rely on trial and error to assess lossy compression performance. As a strong data-driven effort toward quantifying lossy compressibility of scientific datasets, we provide a statistical framework to predict compression ratios of lossy compressors. Our method is a two-step framework where (i) compressor-agnostic predictors are computed and (ii) statistical prediction models relying on these predictors are trained on observed compression ratios. Proposed predictors exploit spatial correlations and notions of entropy and lossyness via the quantized entropy. We study 8+ compressors on 6 scientific datasets and achieve a median percentage prediction error less than 12%, which is substantially smaller than that of other methods while achieving at least a 8.8× speedup for searching for a specific compression ratio and 7.8× speedup for determining the best compressor out of a collection.","PeriodicalId":54957,"journal":{"name":"International Journal of High Performance Computing Applications","volume":"37 1","pages":"412 - 433"},"PeriodicalIF":3.1,"publicationDate":"2023-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45578601","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Corrigendum to large-scale direct numerical simulations of turbulence using GPUs and modern Fortran 使用gpu和现代Fortran的大规模直接数值模拟湍流的勘误表
3区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2023-05-05 DOI: 10.1177/10943420231173573
{"title":"Corrigendum to large-scale direct numerical simulations of turbulence using GPUs and modern Fortran","authors":"","doi":"10.1177/10943420231173573","DOIUrl":"https://doi.org/10.1177/10943420231173573","url":null,"abstract":"","PeriodicalId":54957,"journal":{"name":"International Journal of High Performance Computing Applications","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136096371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A study on the performance of distributed training of data-driven CFD simulations 数据驱动CFD模拟的分布式训练性能研究
IF 3.1 3区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2023-05-04 DOI: 10.1177/10943420231160557
Sergio Iserte, Alejandro González-Barberá, Paloma Barreda, K. Rojek
Data-driven methods for computer simulations are blooming in many scientific areas. The traditional approach to simulating physical behaviors relies on solving partial differential equations (PDEs). Since calculating these iterative equations is highly both computationally demanding and time-consuming, data-driven methods leverage artificial intelligence (AI) techniques to alleviate that workload. Data-driven methods have to be trained in advance to provide their subsequent fast predictions; however, the cost of the training stage is non-negligible. This article presents a predictive model for inferencing future states of a specific fluid simulation that serves as a use case for evaluating different training alternatives. Particularly, this study compares the performance of only CPU, multi-GPU, and distributed approaches for training a time series forecasting deep learning model. With some slight code adaptations, results show and compare, in different implementations, the benefits of distributed GPU-enabled training for predicting high-accuracy states in a fraction of the time needed by the computational fluid dynamics solver.
计算机模拟的数据驱动方法在许多科学领域蓬勃发展。模拟物理行为的传统方法依赖于求解偏微分方程。由于计算这些迭代方程的计算要求很高,也很耗时,数据驱动的方法利用人工智能(AI)技术来减轻工作量。数据驱动的方法必须事先进行训练,以提供其后续的快速预测;然而,训练阶段的成本是不可忽略的。本文提出了一个预测模型,用于推断特定流体模拟的未来状态,作为评估不同训练方案的用例。特别是,本研究比较了仅CPU、多GPU和分布式方法在训练时间序列预测深度学习模型方面的性能。经过一些轻微的代码调整,结果显示并比较了在不同实现中,分布式GPU训练的好处,可以在计算流体动力学求解器所需的一小部分时间内预测高精度状态。
{"title":"A study on the performance of distributed training of data-driven CFD simulations","authors":"Sergio Iserte, Alejandro González-Barberá, Paloma Barreda, K. Rojek","doi":"10.1177/10943420231160557","DOIUrl":"https://doi.org/10.1177/10943420231160557","url":null,"abstract":"Data-driven methods for computer simulations are blooming in many scientific areas. The traditional approach to simulating physical behaviors relies on solving partial differential equations (PDEs). Since calculating these iterative equations is highly both computationally demanding and time-consuming, data-driven methods leverage artificial intelligence (AI) techniques to alleviate that workload. Data-driven methods have to be trained in advance to provide their subsequent fast predictions; however, the cost of the training stage is non-negligible. This article presents a predictive model for inferencing future states of a specific fluid simulation that serves as a use case for evaluating different training alternatives. Particularly, this study compares the performance of only CPU, multi-GPU, and distributed approaches for training a time series forecasting deep learning model. With some slight code adaptations, results show and compare, in different implementations, the benefits of distributed GPU-enabled training for predicting high-accuracy states in a fraction of the time needed by the computational fluid dynamics solver.","PeriodicalId":54957,"journal":{"name":"International Journal of High Performance Computing Applications","volume":"37 1","pages":"503 - 515"},"PeriodicalIF":3.1,"publicationDate":"2023-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49621368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Orchestration of materials science workflows for heterogeneous resources at large scale 大规模异构资源的材料科学工作流编排
IF 3.1 3区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2023-04-14 DOI: 10.1177/10943420231167800
Naweiluo Zhou, G. Scorzelli, Jakob Luettgau, R. Kancharla, Joshua J. Kane, Robert Wheeler, B. Croom, P. Newell, Valerio Pascucci, M. Taufer
In the era of big data, materials science workflows need to handle large-scale data distribution, storage, and computation. Any of these areas can become a performance bottleneck. We present a framework for analyzing internal material structures (e.g., cracks) to mitigate these bottlenecks. We demonstrate the effectiveness of our framework for a workflow performing synchrotron X-ray computed tomography reconstruction and segmentation of a silica-based structure. Our framework provides a cloud-based, cutting-edge solution to challenges such as growing intermediate and output data and heavy resource demands during image reconstruction and segmentation. Specifically, our framework efficiently manages data storage, scaling up compute resources on the cloud. The multi-layer software structure of our framework includes three layers. A top layer uses Jupyter notebooks and serves as the user interface. A middle layer uses Ansible for resource deployment and managing the execution environment. A low layer is dedicated to resource management and provides resource management and job scheduling on heterogeneous nodes (i.e., GPU and CPU). At the core of this layer, Kubernetes supports resource management, and Dask enables large-scale job scheduling for heterogeneous resources. The broader impact of our work is four-fold: through our framework, we hide the complexity of the cloud’s software stack to the user who otherwise is required to have expertise in cloud technologies; we manage job scheduling efficiently and in a scalable manner; we enable resource elasticity and workflow orchestration at a large scale; and we facilitate moving the study of nonporous structures, which has wide applications in engineering and scientific fields, to the cloud. While we demonstrate the capability of our framework for a specific materials science application, it can be adapted for other applications and domains because of its modular, multi-layer architecture.
在大数据时代,材料科学工作流程需要处理大规模的数据分发、存储和计算。这些领域中的任何一个都可能成为性能瓶颈。我们提出了一个分析内部材料结构(如裂纹)的框架,以缓解这些瓶颈。我们证明了我们的框架在同步加速器X射线计算机断层扫描重建和分割二氧化硅结构的工作流程中的有效性。我们的框架提供了一个基于云的尖端解决方案,以应对图像重建和分割过程中不断增长的中间和输出数据以及繁重的资源需求等挑战。具体来说,我们的框架有效地管理数据存储,扩展云上的计算资源。我们框架的多层软件结构包括三层。顶层使用Jupyter笔记本电脑并充当用户界面。中间层使用Ansible进行资源部署和管理执行环境。底层专用于资源管理,并在异构节点(即GPU和CPU)上提供资源管理和作业调度。在该层的核心,Kubernetes支持资源管理,Dask支持异构资源的大规模作业调度。我们工作的更广泛影响有四个方面:通过我们的框架,我们向用户隐藏了云软件堆栈的复杂性,否则用户需要具备云技术方面的专业知识;我们以可扩展的方式高效地管理作业调度;我们实现了大规模的资源弹性和工作流协调;我们还推动了将在工程和科学领域有广泛应用的无孔结构研究转移到云端。虽然我们展示了我们的框架用于特定材料科学应用的能力,但由于其模块化、多层架构,它可以适用于其他应用和领域。
{"title":"Orchestration of materials science workflows for heterogeneous resources at large scale","authors":"Naweiluo Zhou, G. Scorzelli, Jakob Luettgau, R. Kancharla, Joshua J. Kane, Robert Wheeler, B. Croom, P. Newell, Valerio Pascucci, M. Taufer","doi":"10.1177/10943420231167800","DOIUrl":"https://doi.org/10.1177/10943420231167800","url":null,"abstract":"In the era of big data, materials science workflows need to handle large-scale data distribution, storage, and computation. Any of these areas can become a performance bottleneck. We present a framework for analyzing internal material structures (e.g., cracks) to mitigate these bottlenecks. We demonstrate the effectiveness of our framework for a workflow performing synchrotron X-ray computed tomography reconstruction and segmentation of a silica-based structure. Our framework provides a cloud-based, cutting-edge solution to challenges such as growing intermediate and output data and heavy resource demands during image reconstruction and segmentation. Specifically, our framework efficiently manages data storage, scaling up compute resources on the cloud. The multi-layer software structure of our framework includes three layers. A top layer uses Jupyter notebooks and serves as the user interface. A middle layer uses Ansible for resource deployment and managing the execution environment. A low layer is dedicated to resource management and provides resource management and job scheduling on heterogeneous nodes (i.e., GPU and CPU). At the core of this layer, Kubernetes supports resource management, and Dask enables large-scale job scheduling for heterogeneous resources. The broader impact of our work is four-fold: through our framework, we hide the complexity of the cloud’s software stack to the user who otherwise is required to have expertise in cloud technologies; we manage job scheduling efficiently and in a scalable manner; we enable resource elasticity and workflow orchestration at a large scale; and we facilitate moving the study of nonporous structures, which has wide applications in engineering and scientific fields, to the cloud. While we demonstrate the capability of our framework for a specific materials science application, it can be adapted for other applications and domains because of its modular, multi-layer architecture.","PeriodicalId":54957,"journal":{"name":"International Journal of High Performance Computing Applications","volume":"37 1","pages":"260 - 271"},"PeriodicalIF":3.1,"publicationDate":"2023-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49196158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Versatile software-defined HPC and cloud clusters on Alps supercomputer for diverse workflows Alps超级计算机上的多功能软件定义HPC和云集群,用于不同的工作流程
IF 3.1 3区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2023-04-11 DOI: 10.1177/10943420231167811
S. Alam, M. Gila, Mark Klein, Maxime Martinasso, T. Schulthess
Supercomputers have been driving innovations for performance and scaling benefiting several scientific applications for the past few decades. Yet their ecosystems remain virtually unchanged when it comes to integrating distributed data-driven workflows, primarily due to rather rigid access methods and restricted configuration management options. X-as-a-Service model of cloud has introduced, among other features, a developer-centric DevOps approach empowering developers of infrastructure, platform to software artefacts, which, unfortunately contemporary supercomputers still lack. We introduce vClusters (versatile software-defined clusters), which is based on Infrastructure-as-code (IaC) technology. vClusters approach is a unique fusion of HPC and cloud technologies resulting in a software-defined, multi-tenant cluster on a supercomputing ecosystem, that, together with software-defined storage, enable DevOps for complex, data-driven workflows like grid middleware, alongside a classic HPC platform. IaC has been a commonplace in cloud computing, however, it lacked adoption within multi-Petascale ecosystems due to concerns related to performance and interoperability with classic HPC data centres’ ecosystems. We present an overview of the Swiss National Supercomputing Centre’s flagship Alps ecosystem as an implementation target for vClusters for HPC and data-driven workflows. Alps is based on the Cray-HPE Shasta EX supercomputing platform that includes an IaC compliant, microservices architecture (MSA) management system, which we leverage for demonstrating vClusters usage for our diverse operational workflows. We provide implementation details of two operational vClusters platforms: a classic HPC platform that is used predominantly by hundreds of users running thousands of large-scale numerical simulations batch jobs; and a widely used, data-intensive, Grid computing middleware platform used for CERN Worldwide LHC Computing Grid (WLCG) operations. The resulting solution showcases reuse and reduction of common configuration recipes across vCluster implementations, minimising operational change management overheads while introducing flexibility for managing artefacts for DevOps required by diverse workflows.
在过去的几十年里,超级计算机一直在推动性能和规模的创新,使一些科学应用受益。然而,在集成分布式数据驱动的工作流时,他们的生态系统几乎没有变化,这主要是由于访问方法相当严格和配置管理选项有限。X-as-a-Service云模型引入了以开发者为中心的DevOps方法,为基础设施、平台到软件人工制品的开发者提供了能力,不幸的是,当代超级计算机仍然缺乏这种方法。我们介绍了vClusters(通用软件定义集群),它基于基础设施即代码(IaC)技术。vClusters方法是HPC和云技术的独特融合,在超级计算生态系统上形成了一个软件定义的多租户集群,与软件定义的存储一起,使DevOps能够实现复杂的数据驱动工作流,如网格中间件,以及经典的HPC平台。IaC在云计算中很常见,但由于担心性能和与传统HPC数据中心生态系统的互操作性,它在多Petascale生态系统中缺乏采用。我们概述了瑞士国家超级计算中心的旗舰阿尔卑斯生态系统,作为HPC和数据驱动工作流vClusters的实施目标。Alps基于Cray HPE Shasta EX超级计算平台,该平台包括一个符合IaC的微服务架构(MSA)管理系统,我们利用该系统来展示vClusters在我们多样化的运营工作流程中的使用情况。我们提供了两个可操作vClusters平台的实现细节:一个经典的HPC平台,主要由数百名运行数千个大规模数值模拟批处理作业的用户使用;以及一个广泛使用的、数据密集型的网格计算中间件平台,用于CERN全球LHC计算网格(WLCG)操作。由此产生的解决方案展示了vCluster实现中常见配置配方的重用和减少,最大限度地减少了运营更改管理开销,同时为管理不同工作流所需的DevOps工件引入了灵活性。
{"title":"Versatile software-defined HPC and cloud clusters on Alps supercomputer for diverse workflows","authors":"S. Alam, M. Gila, Mark Klein, Maxime Martinasso, T. Schulthess","doi":"10.1177/10943420231167811","DOIUrl":"https://doi.org/10.1177/10943420231167811","url":null,"abstract":"Supercomputers have been driving innovations for performance and scaling benefiting several scientific applications for the past few decades. Yet their ecosystems remain virtually unchanged when it comes to integrating distributed data-driven workflows, primarily due to rather rigid access methods and restricted configuration management options. X-as-a-Service model of cloud has introduced, among other features, a developer-centric DevOps approach empowering developers of infrastructure, platform to software artefacts, which, unfortunately contemporary supercomputers still lack. We introduce vClusters (versatile software-defined clusters), which is based on Infrastructure-as-code (IaC) technology. vClusters approach is a unique fusion of HPC and cloud technologies resulting in a software-defined, multi-tenant cluster on a supercomputing ecosystem, that, together with software-defined storage, enable DevOps for complex, data-driven workflows like grid middleware, alongside a classic HPC platform. IaC has been a commonplace in cloud computing, however, it lacked adoption within multi-Petascale ecosystems due to concerns related to performance and interoperability with classic HPC data centres’ ecosystems. We present an overview of the Swiss National Supercomputing Centre’s flagship Alps ecosystem as an implementation target for vClusters for HPC and data-driven workflows. Alps is based on the Cray-HPE Shasta EX supercomputing platform that includes an IaC compliant, microservices architecture (MSA) management system, which we leverage for demonstrating vClusters usage for our diverse operational workflows. We provide implementation details of two operational vClusters platforms: a classic HPC platform that is used predominantly by hundreds of users running thousands of large-scale numerical simulations batch jobs; and a widely used, data-intensive, Grid computing middleware platform used for CERN Worldwide LHC Computing Grid (WLCG) operations. The resulting solution showcases reuse and reduction of common configuration recipes across vCluster implementations, minimising operational change management overheads while introducing flexibility for managing artefacts for DevOps required by diverse workflows.","PeriodicalId":54957,"journal":{"name":"International Journal of High Performance Computing Applications","volume":"37 1","pages":"288 - 305"},"PeriodicalIF":3.1,"publicationDate":"2023-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45220506","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Survey of Graph Comparison Methods with Applications to Nondeterminism in High-Performance Computing 图比较方法及其在高性能计算中不确定性的应用综述
IF 3.1 3区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2023-04-05 DOI: 10.1177/10943420231166610
S. Bhowmick, Patrick Bell, M. Taufer
The convergence of extremely high levels of hardware concurrency and the effective overlap of computation and communication in asynchronous executions has resulted in increasing nondeterminism in High-Performance Computing (HPC) applications. Nondeterminism can manifest at multiple levels: from low-level communication primitives to libraries to application-level functions. No matter its source, nondeterminism can drastically increase the cost of result reproducibility, debugging workflows, testing parallel programs, or ensuring fault-tolerance. Nondeterministic executions of HPC applications can be modeled as event graphs, and the applications’ nondeterministic behavior can be understood and, in some cases, mitigated using graph comparison algorithms. However, a connection between graph comparison algorithms and approaches to understanding nondeterminism in HPC still needs to be established. This survey article moves the first steps toward establishing a connection between graph comparison algorithms and nondeterminism in HPC with its three contributions: it provides a survey of different graph comparison algorithms and a timeline for each category’s significant works; it discusses how existing graph comparison methods do not fully support properties needed to understand nondeterministic patterns in HPC applications; and it presents the open challenges that should be addressed to leverage the power of graph comparisons for the study of nondeterminism in HPC applications.
在异步执行中,极高级别的硬件并发性以及计算和通信的有效重叠导致了高性能计算(HPC)应用程序中不确定性的增加。不确定性可以表现在多个级别:从低级通信原语到库再到应用程序级函数。无论其来源如何,不确定性都可能极大地增加结果再现性、调试工作流、测试并行程序或确保容错性的成本。HPC应用程序的不确定性执行可以建模为事件图,并且可以理解应用程序的不确定性行为,在某些情况下,可以使用图比较算法减轻这种不确定性行为。然而,图比较算法和理解HPC中的不确定性的方法之间的联系仍然需要建立。这篇概括性的文章通过它的三个贡献,为在HPC中建立图比较算法和不确定性之间的联系迈出了第一步:它提供了不同图比较算法的概览和每个类别的重要工作的时间表;它讨论了现有的图形比较方法如何不能完全支持理解HPC应用程序中的不确定性模式所需的属性;它提出了应该解决的开放挑战,以利用图形比较的力量来研究HPC应用程序中的不确定性。
{"title":"A Survey of Graph Comparison Methods with Applications to Nondeterminism in High-Performance Computing","authors":"S. Bhowmick, Patrick Bell, M. Taufer","doi":"10.1177/10943420231166610","DOIUrl":"https://doi.org/10.1177/10943420231166610","url":null,"abstract":"The convergence of extremely high levels of hardware concurrency and the effective overlap of computation and communication in asynchronous executions has resulted in increasing nondeterminism in High-Performance Computing (HPC) applications. Nondeterminism can manifest at multiple levels: from low-level communication primitives to libraries to application-level functions. No matter its source, nondeterminism can drastically increase the cost of result reproducibility, debugging workflows, testing parallel programs, or ensuring fault-tolerance. Nondeterministic executions of HPC applications can be modeled as event graphs, and the applications’ nondeterministic behavior can be understood and, in some cases, mitigated using graph comparison algorithms. However, a connection between graph comparison algorithms and approaches to understanding nondeterminism in HPC still needs to be established. This survey article moves the first steps toward establishing a connection between graph comparison algorithms and nondeterminism in HPC with its three contributions: it provides a survey of different graph comparison algorithms and a timeline for each category’s significant works; it discusses how existing graph comparison methods do not fully support properties needed to understand nondeterministic patterns in HPC applications; and it presents the open challenges that should be addressed to leverage the power of graph comparisons for the study of nondeterminism in HPC applications.","PeriodicalId":54957,"journal":{"name":"International Journal of High Performance Computing Applications","volume":"37 1","pages":"306 - 327"},"PeriodicalIF":3.1,"publicationDate":"2023-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47768873","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Combining multitask and transfer learning with deep Gaussian processes for autotuning-based performance engineering 将多任务和迁移学习与深度高斯过程相结合用于基于自动调谐的性能工程
IF 3.1 3区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2023-03-30 DOI: 10.1177/10943420231166365
P. Luszczek, Wissam M. Sid-Lakhdar, J. Dongarra
We combine deep Gaussian processes (DGPs) with multitask and transfer learning for the performance modeling and optimization of HPC applications. Deep Gaussian processes merge the uncertainty quantification advantage of Gaussian processes (GPs) with the predictive power of deep learning. Multitask and transfer learning allow for improved learning efficiency when several similar tasks are to be learned simultaneously and when previous learned models are sought to help in the learning of new tasks, respectively. A comparison with state-of-the-art autotuners shows the advantage of our approach on two application problems. In this article, we combine DGPs with multitask and transfer learning to allow for both an improved tuning of an application parameters on problems of interest but also the prediction of parameters on any potential problem the application might encounter.
我们将深度高斯过程(DGPs)与多任务和迁移学习相结合,用于高性能计算应用的性能建模和优化。深度高斯过程将高斯过程的不确定性量化优势与深度学习的预测能力相结合。多任务学习和迁移学习分别在需要同时学习多个相似任务和在学习新任务时寻求先前学习模型的帮助时提高了学习效率。与最先进的自动调谐器的比较显示了我们的方法在两个应用问题上的优势。在本文中,我们将dgp与多任务和迁移学习结合起来,既可以针对感兴趣的问题改进应用程序参数的调优,也可以对应用程序可能遇到的任何潜在问题进行参数预测。
{"title":"Combining multitask and transfer learning with deep Gaussian processes for autotuning-based performance engineering","authors":"P. Luszczek, Wissam M. Sid-Lakhdar, J. Dongarra","doi":"10.1177/10943420231166365","DOIUrl":"https://doi.org/10.1177/10943420231166365","url":null,"abstract":"We combine deep Gaussian processes (DGPs) with multitask and transfer learning for the performance modeling and optimization of HPC applications. Deep Gaussian processes merge the uncertainty quantification advantage of Gaussian processes (GPs) with the predictive power of deep learning. Multitask and transfer learning allow for improved learning efficiency when several similar tasks are to be learned simultaneously and when previous learned models are sought to help in the learning of new tasks, respectively. A comparison with state-of-the-art autotuners shows the advantage of our approach on two application problems. In this article, we combine DGPs with multitask and transfer learning to allow for both an improved tuning of an application parameters on problems of interest but also the prediction of parameters on any potential problem the application might encounter.","PeriodicalId":54957,"journal":{"name":"International Journal of High Performance Computing Applications","volume":"37 1","pages":"229 - 244"},"PeriodicalIF":3.1,"publicationDate":"2023-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46811754","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Automatizing the creation of specialized high-performance computing containers 自动化创建专用的高性能计算容器
IF 3.1 3区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2023-03-29 DOI: 10.1177/10943420231165729
J. Ejarque, Rosa M. Badia
With Exascale computing already here, supercomputers are systems every time larger, more complex, and heterogeneous. While expert system administrators can install and deploy applications in the systems correctly, this is something that general users can not usually do. The eFlows4HPC project aims to provide methodologies and tools to enable the use and reuse of application workflows. One of the aspects that the project focuses on is simplifying the application deployment in large and complex systems. The approach uses containers, not generic ones, but containers tailored for each target High-Performance Computing (HPC) system. This paper presents the Container Image Creation service developed in the framework of the project and experimentation based on project applications. We compare the performance of the specialized containers against generic containers and against a native installation. The results show that in almost all cases, the specialized containers outperform the generic ones (up to 2× faster), and in all cases, the performance is the same as with the native installation.
随着Exascale计算技术的发展,超级计算机每次都是更大、更复杂、更异构的系统。虽然专家系统管理员可以在系统中正确安装和部署应用程序,但这是普通用户通常无法做到的。eFlows4HPC项目旨在提供方法和工具,以实现应用程序工作流的使用和重用。该项目关注的一个方面是简化大型复杂系统中的应用程序部署。该方法使用容器,而不是通用容器,而是为每个目标高性能计算(HPC)系统定制的容器。本文介绍了在项目框架下开发的容器图像创建服务,并基于项目应用进行了实验。我们将专用容器与通用容器和本机安装的性能进行比较。结果表明,在几乎所有情况下,专用容器都优于通用容器(速度快2倍),并且在所有情况下的性能都与本机安装相同。
{"title":"Automatizing the creation of specialized high-performance computing containers","authors":"J. Ejarque, Rosa M. Badia","doi":"10.1177/10943420231165729","DOIUrl":"https://doi.org/10.1177/10943420231165729","url":null,"abstract":"With Exascale computing already here, supercomputers are systems every time larger, more complex, and heterogeneous. While expert system administrators can install and deploy applications in the systems correctly, this is something that general users can not usually do. The eFlows4HPC project aims to provide methodologies and tools to enable the use and reuse of application workflows. One of the aspects that the project focuses on is simplifying the application deployment in large and complex systems. The approach uses containers, not generic ones, but containers tailored for each target High-Performance Computing (HPC) system. This paper presents the Container Image Creation service developed in the framework of the project and experimentation based on project applications. We compare the performance of the specialized containers against generic containers and against a native installation. The results show that in almost all cases, the specialized containers outperform the generic ones (up to 2× faster), and in all cases, the performance is the same as with the native installation.","PeriodicalId":54957,"journal":{"name":"International Journal of High Performance Computing Applications","volume":"37 1","pages":"272 - 287"},"PeriodicalIF":3.1,"publicationDate":"2023-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41409061","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Accelerating cluster dynamics simulation of fission gas behavior in nuclear fuel on deep computing unit–based heterogeneous architecture supercomputer 基于深度计算单元的异构结构超级计算机上核燃料裂变气体行为加速簇动力学模拟
IF 3.1 3区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2023-03-14 DOI: 10.1177/10943420231162831
He Bai, Changjun Hu, Yuhan Zhu, Dandan Chen, Genshen Chu, Shuai Ren
High fidelity simulation of fission gas behavior is able to help us understand and predict the performance of nuclear fuel under different irradiation conditions. Cluster dynamics (CD) is a mesoscale simulation method which is rapidly developed in nuclear fuel research area in recent years, and it can effectively describe the microdynamic behavior of fission gas in nuclear fuel; however, due to the huge cost of computation needed for CD model solution, the application scenario of CD has been limited. Thus, how to design the acceleration algorithm for the given computing resources to improve the computing efficiency and simulation scale has become a key problem of CD simulation. In this work, we present an accelerating cluster dynamics model based on the spatially dependent cluster dynamics model, combined with multi optimization methods on a DCU (deep computing unit)-based heterogeneous architecture supercomputer. The correctness of the model is verified by comparing with experimental data and Xolotl—a software of SciDAC program from the U.S. Department of Energy’s Office of Science. Furthermore, our model implementation has a better computing performance than Xolotl’s GPU version. Our code has gained great strong/weak scaling performance with more than 72.75%/84.07% parallel efficiency on 1024 compute nodes. This work developed a new efficient model for CD simulation of fission gas in nuclear fuel.
对裂变气体行为的高保真模拟有助于我们了解和预测核燃料在不同辐照条件下的性能。簇动力学(CD)是近年来在核燃料研究领域迅速发展起来的一种中尺度模拟方法,它能有效地描述核燃料裂变气体的微动力学行为;然而,由于CD模型求解所需的巨大计算成本,限制了CD的应用场景。因此,如何在给定的计算资源下设计加速算法以提高计算效率和仿真规模成为CD仿真的关键问题。在此工作中,我们提出了一个基于空间依赖集群动力学模型的加速集群动力学模型,并结合基于DCU(深度计算单元)的异构架构超级计算机的多种优化方法。通过与实验数据和美国能源部科学办公室的SciDAC程序Xolotl-a软件的比较,验证了模型的正确性。此外,我们的模型实现比Xolotl的GPU版本具有更好的计算性能。我们的代码获得了很强/很弱的扩展性能,在1024个计算节点上并行效率超过72.75%/84.07%。本文为核燃料裂变气体的CD模拟建立了一种新的高效模型。
{"title":"Accelerating cluster dynamics simulation of fission gas behavior in nuclear fuel on deep computing unit–based heterogeneous architecture supercomputer","authors":"He Bai, Changjun Hu, Yuhan Zhu, Dandan Chen, Genshen Chu, Shuai Ren","doi":"10.1177/10943420231162831","DOIUrl":"https://doi.org/10.1177/10943420231162831","url":null,"abstract":"High fidelity simulation of fission gas behavior is able to help us understand and predict the performance of nuclear fuel under different irradiation conditions. Cluster dynamics (CD) is a mesoscale simulation method which is rapidly developed in nuclear fuel research area in recent years, and it can effectively describe the microdynamic behavior of fission gas in nuclear fuel; however, due to the huge cost of computation needed for CD model solution, the application scenario of CD has been limited. Thus, how to design the acceleration algorithm for the given computing resources to improve the computing efficiency and simulation scale has become a key problem of CD simulation. In this work, we present an accelerating cluster dynamics model based on the spatially dependent cluster dynamics model, combined with multi optimization methods on a DCU (deep computing unit)-based heterogeneous architecture supercomputer. The correctness of the model is verified by comparing with experimental data and Xolotl—a software of SciDAC program from the U.S. Department of Energy’s Office of Science. Furthermore, our model implementation has a better computing performance than Xolotl’s GPU version. Our code has gained great strong/weak scaling performance with more than 72.75%/84.07% parallel efficiency on 1024 compute nodes. This work developed a new efficient model for CD simulation of fission gas in nuclear fuel.","PeriodicalId":54957,"journal":{"name":"International Journal of High Performance Computing Applications","volume":"37 1","pages":"516 - 529"},"PeriodicalIF":3.1,"publicationDate":"2023-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47260453","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Experiences with nested parallelism in task-parallel applications using malleable BLAS on multicore processors 在多核处理器上使用可延展BLAS的任务并行应用程序中嵌套并行的经验
IF 3.1 3区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2023-03-10 DOI: 10.1177/10943420231157653
Rafael Rodríguez-Sánchez, Adrián Castelló, Sandra Catalán, Francisco D. Igual, E. S. Quintana‐Ortí
Malleability is defined as the ability to vary the degree of parallelism at runtime, and is regarded as a means to improve core occupation on state-of-the-art multicore processors tshat contain tens of computational cores per socket. This property is especially interesting for applications consisting of irregular workloads and/or divergent executions paths. The integration of malleability in high-performance instances of the Basic Linear Algebra Subprograms (BLAS) is currently nonexistent, and, in consequence, applications relying on these computational kernels cannot benefit from this capability. In response to this scenario, in this paper we demonstrate that significant performance benefits can be gathered via the exploitation of malleability in a framework designed to implement portable and high-performance BLAS-like operations. For this purpose, we integrate malleability within the BLIS library, and provide an experimental evaluation of the result on three different practical use cases.
延展性被定义为在运行时改变并行度的能力,并被视为在最先进的多核处理器上提高核心占用率的一种手段。tshat每个套接字包含数十个计算核心。对于由不规则工作负载和/或不同执行路径组成的应用程序,此属性尤其有趣。在基本线性代数子程序(BLAS)的高性能实例中,延展性的集成目前还不存在,因此,依赖这些计算内核的应用程序无法从这种能力中受益。针对这种情况,在本文中,我们展示了通过在一个旨在实现可移植和高性能类似BLAS操作的框架中利用延展性,可以获得显著的性能优势。为此,我们在BLIS库中集成了延展性,并在三个不同的实际用例中对结果进行了实验评估。
{"title":"Experiences with nested parallelism in task-parallel applications using malleable BLAS on multicore processors","authors":"Rafael Rodríguez-Sánchez, Adrián Castelló, Sandra Catalán, Francisco D. Igual, E. S. Quintana‐Ortí","doi":"10.1177/10943420231157653","DOIUrl":"https://doi.org/10.1177/10943420231157653","url":null,"abstract":"Malleability is defined as the ability to vary the degree of parallelism at runtime, and is regarded as a means to improve core occupation on state-of-the-art multicore processors tshat contain tens of computational cores per socket. This property is especially interesting for applications consisting of irregular workloads and/or divergent executions paths. The integration of malleability in high-performance instances of the Basic Linear Algebra Subprograms (BLAS) is currently nonexistent, and, in consequence, applications relying on these computational kernels cannot benefit from this capability. In response to this scenario, in this paper we demonstrate that significant performance benefits can be gathered via the exploitation of malleability in a framework designed to implement portable and high-performance BLAS-like operations. For this purpose, we integrate malleability within the BLIS library, and provide an experimental evaluation of the result on three different practical use cases.","PeriodicalId":54957,"journal":{"name":"International Journal of High Performance Computing Applications","volume":" ","pages":""},"PeriodicalIF":3.1,"publicationDate":"2023-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45507162","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
International Journal of High Performance Computing Applications
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1