首页 > 最新文献

Proceedings of the 26th International Conference on Compiler Construction最新文献

英文 中文
Optimized two-level parallelization for GPU accelerators using the polyhedral model 使用多面体模型优化GPU加速器的两级并行化
Pub Date : 2017-02-05 DOI: 10.1145/3033019.3033022
J. Shirako, Akihiro Hayashi, Vivek Sarkar
While GPUs play an increasingly important role in today's high-performance computers, optimizing GPU performance continues to impose large burdens upon programmers. A major challenge in optimizing codes for GPUs stems from the two levels of hardware parallelism, blocks and threads; each of these levels has significantly different characteristics, requiring different optimization strategies. In this paper, we propose a novel compiler optimization algorithm for GPU parallelism. Our approach is based on the polyhedral model, which has enabled significant advances in program analysis and transformation compared to traditional AST-based frameworks. We extend polyhedral schedules to enable two-level parallelization through the idea of superposition, which integrates separate schedules for block-level and thread-level parallelism. Our experimental results demonstrate that our proposed compiler optimization framework can deliver 1.8x and 2.1x geometric mean improvements on NVIDIA Tesla M2050 and K80 GPUs, compared to a state-of-the-art polyhedral parallel code generator (PPCG) for GPGPUs.
虽然GPU在当今的高性能计算机中扮演着越来越重要的角色,但优化GPU性能仍然给程序员带来了巨大的负担。优化gpu代码的主要挑战来自硬件并行性的两个层面,块和线程;每个级别都有显著不同的特征,需要不同的优化策略。本文提出了一种新的GPU并行编译优化算法。我们的方法基于多面体模型,与传统的基于ast的框架相比,它在程序分析和转换方面取得了重大进展。我们扩展了多面体调度,通过叠加的思想来实现两级并行,它集成了块级和线程级并行的单独调度。我们的实验结果表明,与最先进的gpgpu多面体并行代码生成器(PPCG)相比,我们提出的编译器优化框架可以在NVIDIA Tesla M2050和K80 gpu上提供1.8倍和2.1倍的几何平均改进。
{"title":"Optimized two-level parallelization for GPU accelerators using the polyhedral model","authors":"J. Shirako, Akihiro Hayashi, Vivek Sarkar","doi":"10.1145/3033019.3033022","DOIUrl":"https://doi.org/10.1145/3033019.3033022","url":null,"abstract":"While GPUs play an increasingly important role in today's high-performance computers, optimizing GPU performance continues to impose large burdens upon programmers. A major challenge in optimizing codes for GPUs stems from the two levels of hardware parallelism, blocks and threads; each of these levels has significantly different characteristics, requiring different optimization strategies. In this paper, we propose a novel compiler optimization algorithm for GPU parallelism. Our approach is based on the polyhedral model, which has enabled significant advances in program analysis and transformation compared to traditional AST-based frameworks. We extend polyhedral schedules to enable two-level parallelization through the idea of superposition, which integrates separate schedules for block-level and thread-level parallelism. Our experimental results demonstrate that our proposed compiler optimization framework can deliver 1.8x and 2.1x geometric mean improvements on NVIDIA Tesla M2050 and K80 GPUs, compared to a state-of-the-art polyhedral parallel code generator (PPCG) for GPGPUs.","PeriodicalId":146080,"journal":{"name":"Proceedings of the 26th International Conference on Compiler Construction","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123489158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Partially redundant fence elimination for x86, ARM, and power processors 部分冗余栅栏消除x86, ARM,和电源处理器
Pub Date : 2017-02-05 DOI: 10.1145/3033019.3033021
Robin Morisset, Francesco Zappa Nardelli
We show how partial redundancy elimination (PRE) can be instantiated to perform provably correct fence elimination for multi-threaded programs running on top of the x86, ARM and IBM Power relaxed memory models. We have implemented our algorithm in the backends of the LLVM compiler infrastructure. The optimisation does not induce an observable overhead at compile-time and can result in up-to 10% speedup on some benchmarks.
我们将展示如何实例化部分冗余消除(PRE),以便为运行在x86、ARM和IBM Power宽松内存模型之上的多线程程序执行可证明正确的围栏消除。我们在LLVM编译器基础架构的后端实现了我们的算法。该优化不会在编译时产生可观察到的开销,并且可以在某些基准测试中导致高达10%的加速。
{"title":"Partially redundant fence elimination for x86, ARM, and power processors","authors":"Robin Morisset, Francesco Zappa Nardelli","doi":"10.1145/3033019.3033021","DOIUrl":"https://doi.org/10.1145/3033019.3033021","url":null,"abstract":"We show how partial redundancy elimination (PRE) can be instantiated to perform provably correct fence elimination for multi-threaded programs running on top of the x86, ARM and IBM Power relaxed memory models. We have implemented our algorithm in the backends of the LLVM compiler infrastructure. The optimisation does not induce an observable overhead at compile-time and can result in up-to 10% speedup on some benchmarks.","PeriodicalId":146080,"journal":{"name":"Proceedings of the 26th International Conference on Compiler Construction","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124216986","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
rev.ng: a unified binary analysis framework to recover CFGs and function boundaries rev.ng:一个统一的二进制分析框架,用于恢复CFGs和功能边界
Pub Date : 2017-02-05 DOI: 10.1145/3033019.3033028
A. Federico, Mathias Payer, G. Agosta
Static binary analysis is a key tool to assess the security of third-party binaries and legacy programs. Most forms of binary analysis rely on the availability of two key pieces of information: the program's control-flow graph and function boundaries. However, current tools struggle to provide accurate and precise results, in particular when dealing with hand-written assembly functions and non-trivial control-flow transfer instructions, such as tail calls. In addition, most of the existing solutions are ad-hoc, rely on hand-coded heuristics, and are tied to a specific architecture. In this paper we highlight the challenges faced by an architecture agnostic static binary analysis framework to provide accurate information about a program's CFG and function boundaries without employing debugging information or symbols. We propose a set of analyses to address predicate instructions, noreturn functions, tail calls, and context-dependent CFG. rev.ng, our binary analysis framework based on QEMU and LLVM, handles all the 17 architectures supported by QEMU and produces a compilable LLVM IR. We implement our described analyses on top of LLVM IR. In an extensive evaluation, we test our tool on binaries compiled for MIPS, ARM, and x86-64 using GCC and clang and compare them to the industry's state of the art tool, IDA Pro, and two well-known academic tools, BAP/ByteWeight and angr. In all cases, the quality of the CFG and function boundaries produced by rev.ng is comparable to or improves over the alternatives.
静态二进制分析是评估第三方二进制文件和遗留程序安全性的关键工具。大多数形式的二进制分析依赖于两个关键信息的可用性:程序的控制流图和功能边界。然而,当前的工具很难提供准确和精确的结果,特别是在处理手写的汇编函数和重要的控制流转移指令时,例如尾部调用。此外,大多数现有的解决方案都是特别的,依赖于手工编码的启发式方法,并且与特定的体系结构相关联。在本文中,我们强调了架构不可知的静态二进制分析框架在不使用调试信息或符号的情况下提供有关程序CFG和功能边界的准确信息所面临的挑战。我们提出了一组分析来解决谓词指令、无返回函数、尾部调用和上下文相关的CFG。rev.ng是一个基于QEMU和LLVM的二进制分析框架,可以处理QEMU支持的所有17种架构,并生成一个可编译的LLVM IR。我们在LLVM IR之上实现了我们所描述的分析。在广泛的评估中,我们使用GCC和clang在MIPS、ARM和x86-64编译的二进制文件上测试了我们的工具,并将它们与业界最先进的工具IDA Pro和两个知名的学术工具BAP/ByteWeight和angr进行了比较。在所有情况下,rev.ng生成的CFG和功能边界的质量与替代方案相当或有所改善。
{"title":"rev.ng: a unified binary analysis framework to recover CFGs and function boundaries","authors":"A. Federico, Mathias Payer, G. Agosta","doi":"10.1145/3033019.3033028","DOIUrl":"https://doi.org/10.1145/3033019.3033028","url":null,"abstract":"Static binary analysis is a key tool to assess the security of third-party binaries and legacy programs. Most forms of binary analysis rely on the availability of two key pieces of information: the program's control-flow graph and function boundaries. However, current tools struggle to provide accurate and precise results, in particular when dealing with hand-written assembly functions and non-trivial control-flow transfer instructions, such as tail calls. In addition, most of the existing solutions are ad-hoc, rely on hand-coded heuristics, and are tied to a specific architecture. In this paper we highlight the challenges faced by an architecture agnostic static binary analysis framework to provide accurate information about a program's CFG and function boundaries without employing debugging information or symbols. We propose a set of analyses to address predicate instructions, noreturn functions, tail calls, and context-dependent CFG. rev.ng, our binary analysis framework based on QEMU and LLVM, handles all the 17 architectures supported by QEMU and produces a compilable LLVM IR. We implement our described analyses on top of LLVM IR. In an extensive evaluation, we test our tool on binaries compiled for MIPS, ARM, and x86-64 using GCC and clang and compare them to the industry's state of the art tool, IDA Pro, and two well-known academic tools, BAP/ByteWeight and angr. In all cases, the quality of the CFG and function boundaries produced by rev.ng is comparable to or improves over the alternatives.","PeriodicalId":146080,"journal":{"name":"Proceedings of the 26th International Conference on Compiler Construction","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129843833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 63
Proceedings of the 26th International Conference on Compiler Construction 第26届编译器构造国际会议论文集
{"title":"Proceedings of the 26th International Conference on Compiler Construction","authors":"","doi":"10.1145/3033019","DOIUrl":"https://doi.org/10.1145/3033019","url":null,"abstract":"","PeriodicalId":146080,"journal":{"name":"Proceedings of the 26th International Conference on Compiler Construction","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131581627","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Proceedings of the 26th International Conference on Compiler Construction
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1