Helium: lifting high-performance stencil kernels from stripped x86 binaries to halide DSL code

Charith Mendis, Jeffrey Bosboom, Kevin Wu, S. Kamil, Jonathan Ragan-Kelley, Sylvain Paris, Qin Zhao, Saman P. Amarasinghe
{"title":"Helium: lifting high-performance stencil kernels from stripped x86 binaries to halide DSL code","authors":"Charith Mendis, Jeffrey Bosboom, Kevin Wu, S. Kamil, Jonathan Ragan-Kelley, Sylvain Paris, Qin Zhao, Saman P. Amarasinghe","doi":"10.1145/2737924.2737974","DOIUrl":null,"url":null,"abstract":"Highly optimized programs are prone to bit rot, where performance quickly becomes suboptimal in the face of new hardware and compiler techniques. In this paper we show how to automatically lift performance-critical stencil kernels from a stripped x86 binary and generate the corresponding code in the high-level domain-specific language Halide. Using Halide’s state-of-the-art optimizations targeting current hardware, we show that new optimized versions of these kernels can replace the originals to rejuvenate the application for newer hardware. The original optimized code for kernels in stripped binaries is nearly impossible to analyze statically. Instead, we rely on dynamic traces to regenerate the kernels. We perform buffer structure reconstruction to identify input, intermediate and output buffer shapes. We abstract from a forest of concrete dependency trees which contain absolute memory addresses to symbolic trees suitable for high-level code generation. This is done by canonicalizing trees, clustering them based on structure, inferring higher-dimensional buffer accesses and finally by solving a set of linear equations based on buffer accesses to lift them up to simple, high-level expressions. Helium can handle highly optimized, complex stencil kernels with input-dependent conditionals. We lift seven kernels from Adobe Photoshop giving a 75% performance improvement, four kernels from IrfanView, leading to 4.97× performance, and one stencil from the miniGMG multigrid benchmark netting a 4.25× improvement in performance. We manually rejuvenated Photoshop by replacing eleven of Photoshop’s filters with our lifted implementations, giving 1.12× speedup without affecting the user experience.","PeriodicalId":104101,"journal":{"name":"Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"32","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2737924.2737974","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 32

Abstract

Highly optimized programs are prone to bit rot, where performance quickly becomes suboptimal in the face of new hardware and compiler techniques. In this paper we show how to automatically lift performance-critical stencil kernels from a stripped x86 binary and generate the corresponding code in the high-level domain-specific language Halide. Using Halide’s state-of-the-art optimizations targeting current hardware, we show that new optimized versions of these kernels can replace the originals to rejuvenate the application for newer hardware. The original optimized code for kernels in stripped binaries is nearly impossible to analyze statically. Instead, we rely on dynamic traces to regenerate the kernels. We perform buffer structure reconstruction to identify input, intermediate and output buffer shapes. We abstract from a forest of concrete dependency trees which contain absolute memory addresses to symbolic trees suitable for high-level code generation. This is done by canonicalizing trees, clustering them based on structure, inferring higher-dimensional buffer accesses and finally by solving a set of linear equations based on buffer accesses to lift them up to simple, high-level expressions. Helium can handle highly optimized, complex stencil kernels with input-dependent conditionals. We lift seven kernels from Adobe Photoshop giving a 75% performance improvement, four kernels from IrfanView, leading to 4.97× performance, and one stencil from the miniGMG multigrid benchmark netting a 4.25× improvement in performance. We manually rejuvenated Photoshop by replacing eleven of Photoshop’s filters with our lifted implementations, giving 1.12× speedup without affecting the user experience.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Helium:将高性能模板内核从剥离的x86二进制文件提升到卤化DSL代码
高度优化的程序容易出现位rot,在面对新的硬件和编译器技术时,性能很快就会变成次优。在本文中,我们将展示如何从剥离的x86二进制文件中自动提升性能关键型模板内核,并在高级领域特定语言Halide中生成相应的代码。使用Halide针对当前硬件的最先进的优化,我们展示了这些内核的新优化版本可以取代原始版本,从而为新硬件恢复应用程序的活力。原始的经过优化的二进制内核代码几乎不可能进行静态分析。相反,我们依靠动态跟踪来重新生成核。我们执行缓冲区结构重建来识别输入、中间和输出缓冲区形状。我们从包含绝对内存地址的具体依赖树林中抽象到适合高级代码生成的符号树。这是通过规范化树,基于结构聚类它们,推断高维缓冲区访问,最后通过解决一组基于缓冲区访问的线性方程来将它们提升到简单的高级表达式来实现的。Helium可以处理高度优化的、复杂的具有输入依赖条件的模板内核。我们从Adobe Photoshop中提取了7个内核,使性能提高了75%,从IrfanView中提取了4个内核,使性能提高了4.97倍,从miniGMG multigrid基准中提取了一个模板,使性能提高了4.25倍。我们手动更新了Photoshop,用我们提升的实现替换了11个Photoshop的滤镜,在不影响用户体验的情况下提供了1.12倍的加速。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Dynamic partial order reduction for relaxed memory models Static detection of asymptotic performance bugs in collection traversals Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation Efficient execution of recursive programs on commodity vector hardware KJS: a complete formal semantics of JavaScript
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1