Improving dynamic binary optimization through early-exit guided code region formation

Chun-Chen Hsu, Pangfeng Liu, Jan-Jan Wu, P. Yew, Ding-Yong Hong, W. Hsu, Chien-Min Wang
{"title":"Improving dynamic binary optimization through early-exit guided code region formation","authors":"Chun-Chen Hsu, Pangfeng Liu, Jan-Jan Wu, P. Yew, Ding-Yong Hong, W. Hsu, Chien-Min Wang","doi":"10.1145/2451512.2451519","DOIUrl":null,"url":null,"abstract":"Most dynamic binary translators (DBT) and optimizers (DBO) target binary traces, i.e. frequently executed paths, as code regions to be translated and optimized. Code region formation is the most important first step in all DBTs and DBOs. The quality of the dynamically formed code regions determines the extent and the types of optimization opportunities that can be exposed to DBTs and DBOs, and thus, determines the ultimate quality of the final optimized code. The Next-Executing-Tail (NET) trace formation method used in HP Dynamo is an early example of such techniques. Many existing trace formation schemes are variants of NET. They work very well for most binary traces, but they also suffer a major problem: the formed traces may contain a large number of early exits that could be branched out during the execution. If this happens frequently, the program execution will spend more time in the slow binary interpreter or in the unoptimized code regions than in the optimized traces in code cache. The benefit of the trace optimization is thus lost. Traces/regions with frequently taken early-exits are called delinquent traces/regions. Our empirical study shows that at least 8 of the 12 SPEC CPU2006 integer benchmarks have delinquent traces.\n In this paper, we propose a light-weight region formation technique called Early-Exit Guided Region Formation (EEG) to improve the quality of the formed traces/regions. It iteratively identifies and merges delinquent regions into larger code regions. We have implemented our EEG algorithm in two LLVM-based multi-threaded DBTs targeting ARM and IA32 instruction set architecture (ISA), respectively. Using SPEC CPU2006 benchmark suite with reference inputs, our results show that compared to an NET-variant currently used in QEMU, a state-of-the-art retargetable DBT, EEG can achieve a significant performance improvement of up to 72% (27% on average), and to 49% (23% on average) for IA32 and ARM, respectively.","PeriodicalId":202844,"journal":{"name":"International Conference on Virtual Execution Environments","volume":"51 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-03-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Virtual Execution Environments","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2451512.2451519","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 13

Abstract

Most dynamic binary translators (DBT) and optimizers (DBO) target binary traces, i.e. frequently executed paths, as code regions to be translated and optimized. Code region formation is the most important first step in all DBTs and DBOs. The quality of the dynamically formed code regions determines the extent and the types of optimization opportunities that can be exposed to DBTs and DBOs, and thus, determines the ultimate quality of the final optimized code. The Next-Executing-Tail (NET) trace formation method used in HP Dynamo is an early example of such techniques. Many existing trace formation schemes are variants of NET. They work very well for most binary traces, but they also suffer a major problem: the formed traces may contain a large number of early exits that could be branched out during the execution. If this happens frequently, the program execution will spend more time in the slow binary interpreter or in the unoptimized code regions than in the optimized traces in code cache. The benefit of the trace optimization is thus lost. Traces/regions with frequently taken early-exits are called delinquent traces/regions. Our empirical study shows that at least 8 of the 12 SPEC CPU2006 integer benchmarks have delinquent traces. In this paper, we propose a light-weight region formation technique called Early-Exit Guided Region Formation (EEG) to improve the quality of the formed traces/regions. It iteratively identifies and merges delinquent regions into larger code regions. We have implemented our EEG algorithm in two LLVM-based multi-threaded DBTs targeting ARM and IA32 instruction set architecture (ISA), respectively. Using SPEC CPU2006 benchmark suite with reference inputs, our results show that compared to an NET-variant currently used in QEMU, a state-of-the-art retargetable DBT, EEG can achieve a significant performance improvement of up to 72% (27% on average), and to 49% (23% on average) for IA32 and ARM, respectively.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
通过提前退出引导码区形成改进动态二进制优化
大多数动态二进制翻译器(DBT)和优化器(DBO)将二进制跟踪(即频繁执行的路径)作为要翻译和优化的代码区域。码区形成是所有dbt和dbo中最重要的第一步。动态形成的代码区域的质量决定了可以暴露给dbt和dbo的优化机会的范围和类型,从而决定了最终优化代码的最终质量。HP Dynamo中使用的下一个执行尾(NET)跟踪生成方法是此类技术的早期示例。许多现有的轨迹形成方案都是。NET的变体。它们对于大多数二进制跟踪都工作得很好,但是它们也有一个主要问题:形成的跟踪可能包含大量的早期出口,这些出口可能在执行期间被分支出来。如果经常发生这种情况,程序执行将在缓慢的二进制解释器或未优化的代码区域中花费更多的时间,而不是在代码缓存中的优化跟踪中花费更多的时间。跟踪优化的好处就这样失去了。经常提前退出的轨迹/区域称为拖欠轨迹/区域。我们的实证研究表明,在12个SPEC CPU2006整数基准测试中,至少有8个有拖欠的痕迹。在本文中,我们提出了一种轻量级的区域形成技术,称为早期出口引导区域形成(EEG),以提高形成的道/区域的质量。它迭代地识别错误区域并将其合并为更大的代码区域。我们已经在两个基于llvm的多线程dbt中实现了EEG算法,分别针对ARM和IA32指令集架构(ISA)。使用带有参考输入的SPEC CPU2006基准测试套件,我们的结果表明,与QEMU中目前使用的net变体(最先进的可重新定位DBT)相比,EEG可以实现显著的性能改进,在IA32和ARM上分别提高72%(平均27%)和49%(平均23%)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Shrinking the hypervisor one subsystem at a time: a userspace packet switch for virtual machines A fast abstract syntax tree interpreter for R DBILL: an efficient and retargetable dynamic binary instrumentation framework using llvm backend Ginseng: market-driven memory allocation Tesseract: reconciling guest I/O and hypervisor swapping in a VM
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1