2021 IEEE/ACM 7th Workshop on the LLVM Compiler Infrastructure in HPC (LLVM-HPC)最新文献

英文中文

[Copyright notice] (版权)

2021 IEEE/ACM 7th Workshop on the LLVM Compiler Infrastructure in HPC (LLVM-HPC)

Pub Date : 2021-11-01 DOI: 10.1109/llvmhpc54804.2021.00002

引用次数: 0

Facilitating CoDesign with Automatic Code Similarity Learning 用自动代码相似度学习促进协同设计

2021 IEEE/ACM 7th Workshop on the LLVM Compiler Infrastructure in HPC (LLVM-HPC)

Pub Date : 2021-11-01 DOI: 10.1109/llvmhpc54804.2021.00011

T. Nguyen, E. Strohmaier, J. Shalf

Automating the workload characterization process is increasingly important in hardware design. Although compiler tools can automatically collect profiling data and predict performance behaviors, the process has to be repeated for each potential design. Such challenge is exacerbated by the fast growing body of applications and input problems.We propose an alternative approach based on code similarity learning. The application is decomposed into small kernels that can be mapped to known patterns. The behaviors of a pattern on a hardware setup can be reused. To enable this technology, we propose a new code representation and similarity metric. We automate the detection process using compiler and ML methods. Specifically, we reformulate application’s dataflow graphs so that they can be compared based on both compute and data movement. We show this representation can distinguish kernels in the HPCG benchmark and help suggest optimal configurations for SpMV and GEMM hardware accelerators.

工作负载表征过程的自动化在硬件设计中越来越重要。尽管编译器工具可以自动收集分析数据并预测性能行为，但是对于每个潜在的设计，必须重复这个过程。快速增长的应用程序和输入问题加剧了这一挑战。我们提出了一种基于代码相似度学习的替代方法。应用程序被分解为可以映射到已知模式的小内核。可以重用硬件设置上模式的行为。为了实现该技术，我们提出了一种新的代码表示和相似度度量。我们使用编译器和ML方法自动化检测过程。具体来说，我们重新制定了应用程序的数据流图，以便可以根据计算和数据移动对它们进行比较。我们证明了这种表示可以区分HPCG基准中的内核，并有助于建议SpMV和GEMM硬件加速器的最佳配置。

引用次数: 0

Flacc: Towards OpenACC support for Fortran in the LLVM Ecosystem Flacc:在LLVM生态系统中实现对Fortran的OpenACC支持

2021 IEEE/ACM 7th Workshop on the LLVM Compiler Infrastructure in HPC (LLVM-HPC)

Pub Date : 2021-11-01 DOI: 10.1109/llvmhpc54804.2021.00007

Valentin Clement, J. Vetter

OpenACC is a directive-based programming model for heterogeneous accelerators initially launched in 2010 to provide a portable solution at a level of abstraction above OpenCL, CUDA, and other lower-level programming models. Various implementations of OpenACC for C, C++, and Fortran exist; however, only one open-source, production implementation of OpenACC for Fortran does exist. Moreover, most contemporary compiler tool chains for heterogeneous computing are based on LLVM. This lack of support poses a serious risk for high-performance computing application developers targeting GPUs and other accelerators, and it limits the ability of the community to experiment with, extend, and contribute to the OpenACC specification and open-source implementation itself. To address this gap, we have designed and begun implementing Flacc: an effort funded by the US Exascale Computing Project to develop production OpenACC compiler support for Fortran based on Flang within the LLVM ecosystem. In this paper, we describe the Flacc goals, initial design and prototype, and challenges that we have encountered so far in our prototyping efforts. Flacc is implemented as a MLIR dialect in the Flang Fortran front end in LLVM. The Flacc front end currently supports OpenACC version 3.1, and the Flacc run time is currently under development and relies on contributions from the Clacc project. Current contributions to Flacc are available in the main ${color{Green}{mathbf{LLVM}};{mathbf{repository}}}$.1

OpenACC是一种基于指令的异构加速器编程模型，最初于2010年推出，在OpenCL、CUDA和其他低级编程模型之上的抽象层次上提供可移植的解决方案。OpenACC在C、c++和Fortran上的各种实现都存在;然而，只有一个OpenACC的Fortran开源产品实现是存在的。此外，大多数用于异构计算的当代编译器工具链都是基于LLVM的。这种支持的缺乏给以gpu和其他加速器为目标的高性能计算应用程序开发人员带来了严重的风险，并且限制了社区对OpenACC规范和开源实现本身进行实验、扩展和贡献的能力。为了解决这个问题，我们已经设计并开始实现Flacc:这是一个由美国Exascale计算项目资助的项目，目的是在LLVM生态系统中基于Flang开发支持Fortran的OpenACC编译器。在本文中，我们描述了Flacc的目标、初始设计和原型，以及到目前为止我们在原型工作中遇到的挑战。Flacc在LLVM的Flang Fortran前端作为MLIR方言实现。Flacc前端目前支持OpenACC 3.1版本，Flacc运行时目前正在开发中，依赖于Clacc项目的贡献。当前对Flacc的贡献可以在主目录${color{Green}{mathbf{LLVM}}};{mathbf{repository}}}$.1中获得

{"title":"Flacc: Towards OpenACC support for Fortran in the LLVM Ecosystem","authors":"Valentin Clement, J. Vetter","doi":"10.1109/llvmhpc54804.2021.00007","DOIUrl":"https://doi.org/10.1109/llvmhpc54804.2021.00007","url":null,"abstract":"OpenACC is a directive-based programming model for heterogeneous accelerators initially launched in 2010 to provide a portable solution at a level of abstraction above OpenCL, CUDA, and other lower-level programming models. Various implementations of OpenACC for C, C++, and Fortran exist; however, only one open-source, production implementation of OpenACC for Fortran does exist. Moreover, most contemporary compiler tool chains for heterogeneous computing are based on LLVM. This lack of support poses a serious risk for high-performance computing application developers targeting GPUs and other accelerators, and it limits the ability of the community to experiment with, extend, and contribute to the OpenACC specification and open-source implementation itself. To address this gap, we have designed and begun implementing Flacc: an effort funded by the US Exascale Computing Project to develop production OpenACC compiler support for Fortran based on Flang within the LLVM ecosystem. In this paper, we describe the Flacc goals, initial design and prototype, and challenges that we have encountered so far in our prototyping efforts. Flacc is implemented as a MLIR dialect in the Flang Fortran front end in LLVM. The Flacc front end currently supports OpenACC version 3.1, and the Flacc run time is currently under development and relies on contributions from the Clacc project. Current contributions to Flacc are available in the main ${color{Green}{mathbf{LLVM}};{mathbf{repository}}}$.1","PeriodicalId":140581,"journal":{"name":"2021 IEEE/ACM 7th Workshop on the LLVM Compiler Infrastructure in HPC (LLVM-HPC)","volume":"13 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133774355","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

A High Performance Sparse Tensor Algebra Compiler in MLIR MLIR中高性能稀疏张量代数编译器

2021 IEEE/ACM 7th Workshop on the LLVM Compiler Infrastructure in HPC (LLVM-HPC)

Pub Date : 2021-11-01 DOI: 10.1109/llvmhpc54804.2021.00009

Ruiqin Tian, Luanzheng Guo, Jiajia Li, Bin Ren, Gokcen Kestor

Sparse tensor algebra is widely used in many applications, including scientific computing, machine learning, and data analytics. The performance of sparse tensor algebra kernels strongly depends on the intrinsic characteristics of the input tensors, hence many storage formats are designed for tensors to achieve optimal performance for particular applications/architectures, which makes it challenging to implement and optimize every tensor operation of interest on a given architecture. We propose a tensor algebra domain-specific language (DSL) and compiler framework to automatically generate kernels for mixed sparse-dense tensor algebra operations. The proposed DSL provides high-level programming abstractions that resemble the familiar Einstein notation to represent tensor algebra operations. The compiler introduces a new Sparse Tensor Algebra dialect built on top of LLVM’s extensible MLIR compiler infrastructure for efficient code generation while covering a wide range of tensor storage formats. Our compiler also leverages input-dependent code optimization to enhance data locality for better performance. Our results show that the performance of automatically generated kernels outperforms the state-of-the-art sparse tensor algebra compiler, with up to 20.92x, 6.39x, and 13.9x performance improvement over state-of-the-art tensor algebra compilers, for parallel SpMV, SpMM, and TTM, respectively.

稀疏张量代数广泛应用于科学计算、机器学习和数据分析等领域。稀疏张量代数核的性能强烈依赖于输入张量的固有特性，因此许多存储格式都是为张量设计的，以实现特定应用/体系结构的最佳性能，这使得在给定体系结构上实现和优化每个感兴趣的张量操作具有挑战性。我们提出了一个张量代数领域特定语言(DSL)和编译器框架来自动生成混合稀疏密集张量代数运算的核。所建议的DSL提供了类似于熟悉的爱因斯坦符号的高级编程抽象来表示张量代数操作。编译器引入了一种新的稀疏张量代数方言，该方言建立在LLVM的可扩展MLIR编译器基础设施之上，用于高效的代码生成，同时涵盖了广泛的张量存储格式。我们的编译器还利用与输入相关的代码优化来增强数据局部性，以获得更好的性能。我们的研究结果表明，自动生成核的性能优于最先进的稀疏张量代数编译器，在并行SpMV、SpMM和TTM方面，分别比最先进的张量代数编译器提高了20.92倍、6.39倍和13.9倍。

{"title":"A High Performance Sparse Tensor Algebra Compiler in MLIR","authors":"Ruiqin Tian, Luanzheng Guo, Jiajia Li, Bin Ren, Gokcen Kestor","doi":"10.1109/llvmhpc54804.2021.00009","DOIUrl":"https://doi.org/10.1109/llvmhpc54804.2021.00009","url":null,"abstract":"Sparse tensor algebra is widely used in many applications, including scientific computing, machine learning, and data analytics. The performance of sparse tensor algebra kernels strongly depends on the intrinsic characteristics of the input tensors, hence many storage formats are designed for tensors to achieve optimal performance for particular applications/architectures, which makes it challenging to implement and optimize every tensor operation of interest on a given architecture. We propose a tensor algebra domain-specific language (DSL) and compiler framework to automatically generate kernels for mixed sparse-dense tensor algebra operations. The proposed DSL provides high-level programming abstractions that resemble the familiar Einstein notation to represent tensor algebra operations. The compiler introduces a new Sparse Tensor Algebra dialect built on top of LLVM’s extensible MLIR compiler infrastructure for efficient code generation while covering a wide range of tensor storage formats. Our compiler also leverages input-dependent code optimization to enhance data locality for better performance. Our results show that the performance of automatically generated kernels outperforms the state-of-the-art sparse tensor algebra compiler, with up to 20.92x, 6.39x, and 13.9x performance improvement over state-of-the-art tensor algebra compilers, for parallel SpMV, SpMM, and TTM, respectively.","PeriodicalId":140581,"journal":{"name":"2021 IEEE/ACM 7th Workshop on the LLVM Compiler Infrastructure in HPC (LLVM-HPC)","volume":"190 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133748294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

OpenMP aware MHP Analysis for Improved Static Data-Race Detection 基于OpenMP的MHP分析改进静态数据竞争检测

2021 IEEE/ACM 7th Workshop on the LLVM Compiler Infrastructure in HPC (LLVM-HPC)

Pub Date : 2021-11-01 DOI: 10.1109/LLVMHPC54804.2021.00006

Utpal Bora, Shraiysh Vaishay, Saurabh Joshi, Ramakrishna Upadrasta

Data races, a major source of bugs in concurrent programs, can result in loss of manpower and time as well as data loss due to system failures. OpenMP, the de facto shared memory parallelism framework used in the HPC community, also suffers from data races. To detect race conditions in OpenMP programs and improve turnaround time and/or developer productivity, we present a data flow analysis based, fast, static data race checker in the LLVM compiler framework. Our tool can detect races in the presence or absence of explicit barriers, with implicit or explicit synchronization. In addition, our tool effectively works for the OpenMP target offloading constructs and also supports the frequently used OpenMP constructs.We formalize and provide a data flow analysis framework to perform Phase Interval Analysis (PIA) of OpenMP programs. Phase intervals are then used to compute the MHP (and its complement NHP) sets for the programs, which, in turn, are used to detect data races statically.We evaluate our work using multiple OpenMP race detection benchmarks and real world applications. Our experiments show that the checker is comparable to the state-of-the-art in various performance metrics with around 90% accuracy, almost perfect recall, and significantly lower runtime and memory footprint.

数据竞争是并发程序中bug的主要来源，它会导致人力和时间的损失以及由于系统故障而导致的数据丢失。OpenMP, HPC社区中使用的事实上的共享内存并行框架，也受到数据竞争的困扰。为了检测OpenMP程序中的竞争状况并提高周转时间和/或开发人员的生产力，我们在LLVM编译器框架中提出了一个基于数据流分析的快速静态数据竞争检查器。我们的工具可以通过隐式或显式同步检测是否存在显式障碍的种族。此外，我们的工具有效地适用于OpenMP目标卸载构造，并且还支持常用的OpenMP构造。我们形式化并提供了一个数据流分析框架来执行OpenMP程序的相位间隔分析(PIA)。然后，相位间隔用于计算程序的MHP(及其补充NHP)集，这些集又用于静态检测数据竞争。我们使用多个OpenMP竞争检测基准和真实世界的应用程序来评估我们的工作。我们的实验表明，该检查器在各种性能指标上与最先进的检查器相当，准确率约为90%，几乎完美的召回率，并且显著降低了运行时和内存占用。

{"title":"OpenMP aware MHP Analysis for Improved Static Data-Race Detection","authors":"Utpal Bora, Shraiysh Vaishay, Saurabh Joshi, Ramakrishna Upadrasta","doi":"10.1109/LLVMHPC54804.2021.00006","DOIUrl":"https://doi.org/10.1109/LLVMHPC54804.2021.00006","url":null,"abstract":"Data races, a major source of bugs in concurrent programs, can result in loss of manpower and time as well as data loss due to system failures. OpenMP, the de facto shared memory parallelism framework used in the HPC community, also suffers from data races. To detect race conditions in OpenMP programs and improve turnaround time and/or developer productivity, we present a data flow analysis based, fast, static data race checker in the LLVM compiler framework. Our tool can detect races in the presence or absence of explicit barriers, with implicit or explicit synchronization. In addition, our tool effectively works for the OpenMP target offloading constructs and also supports the frequently used OpenMP constructs.We formalize and provide a data flow analysis framework to perform Phase Interval Analysis (PIA) of OpenMP programs. Phase intervals are then used to compute the MHP (and its complement NHP) sets for the programs, which, in turn, are used to detect data races statically.We evaluate our work using multiple OpenMP race detection benchmarks and real world applications. Our experiments show that the checker is comparable to the state-of-the-art in various performance metrics with around 90% accuracy, almost perfect recall, and significantly lower runtime and memory footprint.","PeriodicalId":140581,"journal":{"name":"2021 IEEE/ACM 7th Workshop on the LLVM Compiler Infrastructure in HPC (LLVM-HPC)","volume":"266 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123264396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Extending LLVM IR for DPC++ Matrix Support: A Case Study with Intel® Advanced Matrix Extensions (Intel® AMX) 为dpc++矩阵支持扩展LLVM IR:使用Intel®高级矩阵扩展(Intel®AMX)的案例研究

2021 IEEE/ACM 7th Workshop on the LLVM Compiler Infrastructure in HPC (LLVM-HPC)

Pub Date : 2021-11-01 DOI: 10.1109/llvmhpc54804.2021.00008

Dounia Khaldi, Yuanke Luo, Bing Yu, A. Sotkin, B. Morais, M. Girkar

In this paper, we introduce a DPC++ matrix extension to unify different tensor hardware: Intel® Advanced Matrix Extensions (Intel® AMX) to CPUs, NVIDIA® TPUs, IBM® POWER® MMA, etc. These tensor hardware units are usually accessed by low-level intrinsics or assembly to perform matrix operations. It is hard for scientists to program these domain- specific devices without the kind of high-level abstractions and efficient implementations we introduce here.We also extend the existing LLVM matrix intrinsics to represent this DPC++ extension and yield efficient Intel AMX code generation. Based on our case study of implementing this interface on Intel AMX hardware, we discuss some of the limitations of existing LLVM Intermediate Representation (IR) and how they can be overcome to exploit tensor hardware.

本文介绍了一种用于统一不同张量硬件的dpc++矩阵扩展:Intel®Advanced matrix Extensions (Intel®AMX)到cpu、NVIDIA®tpu、IBM®POWER®MMA等。这些张量硬件单元通常由低级的本征函数或汇编来访问，以执行矩阵操作。如果没有我们在这里介绍的这种高级抽象和高效实现，科学家很难对这些特定领域的设备进行编程。我们还扩展了现有的LLVM矩阵特性来表示这个dpc++扩展，并产生了高效的Intel AMX代码生成。基于我们在Intel AMX硬件上实现该接口的案例研究，我们讨论了现有LLVM中间表示(IR)的一些局限性，以及如何克服它们以利用张量硬件。

引用次数: 1

Toward an Automated Hardware Pipelining LLVM Pass Infrastructure 面向自动化硬件流水线的LLVM通道基础架构

2021 IEEE/ACM 7th Workshop on the LLVM Compiler Infrastructure in HPC (LLVM-HPC)

Pub Date : 2021-11-01 DOI: 10.1109/llvmhpc54804.2021.00010

John D. Leidel, Ryan Kabrick, D. Donofrio

The many nuances associated with hardware development have fostered a development environment exclusive to those possessing extensive knowledge on the low-level implementation details necessary for an effective design. Allowing users to focus on the design aspects specific to the domain they work in by abstracting the low-level implementation details could prove invaluable to their successThis work describes the StoneCutter infrastructure, along with its encompassing OpenSoC System Architect suite of tools, provide users with a high-level, C-like syntax for rapidly designing ISAs. The compiler is responsible for ingesting instruction definitions and generating optimized Chisel HDL output as well as target-specific LLVM-linked compiler capable of executing binaries on the prototype ISA. During the codegen phase, the necessary control signals are subsequently generated and then used to automatically pipeline the entire ISA based on the design’s I/O, arithmetic operations, and flow-control.

与硬件开发相关的许多细微差别已经形成了一个开发环境，只对那些对有效设计所必需的低级实现细节具有广泛知识的人开放。允许用户通过抽象底层实现细节来专注于特定领域的设计方面，这对他们的成功来说是无价的。这项工作描述了StoneCutter基础设施，以及它包含的OpenSoC系统架构套件工具，为用户提供了一个高层次的，类似c的语法，用于快速设计isa。编译器负责摄取指令定义并生成优化的Chisel HDL输出，以及能够在原型ISA上执行二进制文件的目标特定的llvm链接编译器。在编码阶段，随后生成必要的控制信号，然后根据设计的I/O、算术运算和流量控制自动地将整个ISA流水线化。

引用次数: 0

[Title page] (标题页)

2021 IEEE/ACM 7th Workshop on the LLVM Compiler Infrastructure in HPC (LLVM-HPC)

Pub Date : 2021-11-01 DOI: 10.1109/llvmhpc54804.2021.00001

引用次数: 0

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2021 IEEE/ACM 7th Workshop on the LLVM Compiler Infrastructure in HPC (LLVM-HPC)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀