首页 > 最新文献

2022 IEEE/ACM 7th International Workshop on Extreme Scale Programming Models and Middleware (ESPM2)最新文献

英文 中文
Broad Performance Measurement Support for Asynchronous Multi-Tasking with APEX 广泛的性能测量支持异步多任务与APEX
K. Huck
APEX (Autonomic Performance Environment for eXascale) is a performance measurement library for distributed, asynchronous multitasking runtime systems. It provides support for both lightweight measurement and high concurrency. To support performance measurement in systems that employ user-level threading, APEX uses a dependency chain in addition to the call stack to produce traces and task dependency graphs. APEX also provides a runtime adaptation system based on the observed system performance. In this paper, we describe the evolution of APEX from its design for HPX to support an array of programming models and abstraction layers and describe some of the features that have evolved to help understand the asynchrony and high concurrency of asynchronous tasking models.
APEX (eXascale的自主性能环境)是一个用于分布式、异步多任务运行时系统的性能测量库。它同时支持轻量级度量和高并发性。为了在使用用户级线程的系统中支持性能度量,APEX除了使用调用堆栈之外,还使用依赖链来生成跟踪和任务依赖图。APEX还提供了基于观察到的系统性能的运行时适应系统。在本文中,我们描述了APEX从为HPX设计到支持一系列编程模型和抽象层的演变,并描述了一些已经演变的特性,以帮助理解异步任务模型的异步和高并发性。
{"title":"Broad Performance Measurement Support for Asynchronous Multi-Tasking with APEX","authors":"K. Huck","doi":"10.1109/ESPM256814.2022.00008","DOIUrl":"https://doi.org/10.1109/ESPM256814.2022.00008","url":null,"abstract":"APEX (Autonomic Performance Environment for eXascale) is a performance measurement library for distributed, asynchronous multitasking runtime systems. It provides support for both lightweight measurement and high concurrency. To support performance measurement in systems that employ user-level threading, APEX uses a dependency chain in addition to the call stack to produce traces and task dependency graphs. APEX also provides a runtime adaptation system based on the observed system performance. In this paper, we describe the evolution of APEX from its design for HPX to support an array of programming models and abstraction layers and describe some of the features that have evolved to help understand the asynchrony and high concurrency of asynchronous tasking models.","PeriodicalId":340754,"journal":{"name":"2022 IEEE/ACM 7th International Workshop on Extreme Scale Programming Models and Middleware (ESPM2)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129143838","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Selective Nesting Approach for the Sparse Multi-threaded Cholesky Factorization 稀疏多线程Cholesky分解的选择性嵌套方法
Valentin Le Fèvre, Tetsuzo Usui, Marc Casas
Sparse linear algebra routines are fundamental building blocks of a large variety of scientific applications. Direct solvers, which are methods for solving linear systems via the factorization of matrices into products of triangular matrices, are commonly used in many contexts. The Cholesky factorization is the fastest direct method for symmetric and positive definite matrices. This paper presents selective nesting, a method to determine the optimal task granularity for the parallel Cholesky factorization based on the structure of sparse matrices. We propose the Opt-D algorithm, which automatically and dynamically applies selective nesting. Opt-D leverages matrix sparsity to drive complex task-based parallel workloads in the context of direct solvers. We run an extensive evaluation campaign considering a heterogeneous set of 35 sparse matrices and a parallel machine featuring the A64FX processor. Opt-D delivers an average performance speedup of 1.75× with respect to the best state-of-the-art parallel methods to run direct solvers.
稀疏线性代数例程是各种科学应用的基本组成部分。直接求解法是一种通过将矩阵分解成三角矩阵的乘积来求解线性系统的方法,在很多情况下都很常用。Cholesky分解是求解对称正定矩阵最快的直接方法。本文提出了一种基于稀疏矩阵结构确定并行Cholesky分解最优任务粒度的方法——选择性嵌套。我们提出了Opt-D算法,该算法自动动态地应用选择性嵌套。Opt-D利用矩阵稀疏性在直接求解器的上下文中驱动基于任务的复杂并行工作负载。我们进行了广泛的评估活动,考虑到35个稀疏矩阵的异构集和具有A64FX处理器的并行机器。相对于运行直接求解器的最先进的并行方法,Opt-D提供了1.75倍的平均性能加速。
{"title":"A Selective Nesting Approach for the Sparse Multi-threaded Cholesky Factorization","authors":"Valentin Le Fèvre, Tetsuzo Usui, Marc Casas","doi":"10.1109/ESPM256814.2022.00006","DOIUrl":"https://doi.org/10.1109/ESPM256814.2022.00006","url":null,"abstract":"Sparse linear algebra routines are fundamental building blocks of a large variety of scientific applications. Direct solvers, which are methods for solving linear systems via the factorization of matrices into products of triangular matrices, are commonly used in many contexts. The Cholesky factorization is the fastest direct method for symmetric and positive definite matrices. This paper presents selective nesting, a method to determine the optimal task granularity for the parallel Cholesky factorization based on the structure of sparse matrices. We propose the Opt-D algorithm, which automatically and dynamically applies selective nesting. Opt-D leverages matrix sparsity to drive complex task-based parallel workloads in the context of direct solvers. We run an extensive evaluation campaign considering a heterogeneous set of 35 sparse matrices and a parallel machine featuring the A64FX processor. Opt-D delivers an average performance speedup of 1.75× with respect to the best state-of-the-art parallel methods to run direct solvers.","PeriodicalId":340754,"journal":{"name":"2022 IEEE/ACM 7th International Workshop on Extreme Scale Programming Models and Middleware (ESPM2)","volume":"25 5","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114037472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
From Merging Frameworks to Merging Stars: Experiences using HPX, Kokkos and SIMD Types 从合并框架到合并明星:使用HPX, Kokkos和SIMD类型的经验
Gregor Daiß, Srinivas Yadav Singanaboina, Patrick Diehl, H. Kaiser, D. Pflüger
Octo-Tiger, a large-scale 3D AMR code for the merger of stars, uses a combination of HPX, Kokkos and explicit SIMD types, aiming to achieve performance-portability for a broad range of heterogeneous hardware. However, on A64FX CPUs, we encountered several missing pieces, hindering performance by causing problems with the SIMD vectorization. Therefore, we add std::experimental::simd as an option to use in Octo-Tiger’s Kokkos kernels alongside Kokkos SIMD, and further add a new SVE (Scalable Vector Extensions) SIMD backend. Additionally, we amend missing SIMD implementations in the Kokkos kernels within Octo-Tiger’s hydro solver. We test our changes by running Octo-Tiger on three different CPUs: An A64FX, an Intel Icelake and an AMD EPYC CPU, evaluating SIMD speedup and node-level performance. We get a good SIMD speedup on the A64FX CPU, as well as noticeable speedups on the other two CPU platforms. However, we also experience a scaling issue on the EPYC CPU.
Octo-Tiger是一种用于星合并的大规模3D AMR代码,它结合了HPX、Kokkos和显式SIMD类型,旨在实现广泛的异构硬件的性能可移植性。然而,在A64FX cpu上,我们遇到了几个缺失的部分,导致SIMD矢量化问题,从而影响了性能。因此,我们添加了std::experimental::simd作为Octo-Tiger的Kokkos内核中与Kokkos simd一起使用的选项,并进一步添加了一个新的SVE(可伸缩向量扩展)simd后端。此外,我们还在Octo-Tiger的hydro求解器中修改了Kokkos内核中缺失的SIMD实现。我们通过在三个不同的CPU上运行Octo-Tiger来测试我们的变化:一个A64FX,一个Intel Icelake和一个AMD EPYC CPU,评估SIMD加速和节点级性能。我们在A64FX CPU上获得了很好的SIMD加速,在其他两个CPU平台上也有明显的加速。然而,我们在EPYC CPU上也遇到了缩放问题。
{"title":"From Merging Frameworks to Merging Stars: Experiences using HPX, Kokkos and SIMD Types","authors":"Gregor Daiß, Srinivas Yadav Singanaboina, Patrick Diehl, H. Kaiser, D. Pflüger","doi":"10.1109/ESPM256814.2022.00007","DOIUrl":"https://doi.org/10.1109/ESPM256814.2022.00007","url":null,"abstract":"Octo-Tiger, a large-scale 3D AMR code for the merger of stars, uses a combination of HPX, Kokkos and explicit SIMD types, aiming to achieve performance-portability for a broad range of heterogeneous hardware. However, on A64FX CPUs, we encountered several missing pieces, hindering performance by causing problems with the SIMD vectorization. Therefore, we add std::experimental::simd as an option to use in Octo-Tiger’s Kokkos kernels alongside Kokkos SIMD, and further add a new SVE (Scalable Vector Extensions) SIMD backend. Additionally, we amend missing SIMD implementations in the Kokkos kernels within Octo-Tiger’s hydro solver. We test our changes by running Octo-Tiger on three different CPUs: An A64FX, an Intel Icelake and an AMD EPYC CPU, evaluating SIMD speedup and node-level performance. We get a good SIMD speedup on the A64FX CPU, as well as noticeable speedups on the other two CPU platforms. However, we also experience a scaling issue on the EPYC CPU.","PeriodicalId":340754,"journal":{"name":"2022 IEEE/ACM 7th International Workshop on Extreme Scale Programming Models and Middleware (ESPM2)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128881602","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
期刊
2022 IEEE/ACM 7th International Workshop on Extreme Scale Programming Models and Middleware (ESPM2)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1