Evaluating the impact of dynamic binary translation systems on hardware cache performance

Arkaitz Ruiz-Alvarez, K. Hazelwood
{"title":"Evaluating the impact of dynamic binary translation systems on hardware cache performance","authors":"Arkaitz Ruiz-Alvarez, K. Hazelwood","doi":"10.1109/IISWC.2008.4636098","DOIUrl":null,"url":null,"abstract":"Dynamic binary translation systems enable a wide range of applications such as program instrumentation, optimization, and security. DBTs use a software code cache to store previously translated instructions. The code layout in the code cache greatly differs from the code layout of the original program. This paper provides an exhaustive analysis of the performance of the instruction/trace cache and other structures of the micro-architecture while executing DBTs that focus on program instrumentation, such as DynamoRIO and Pin. We performed our evaluation along two axes. First, we directly accessed the hardware performance counters to determine actual cache miss counts. Second, we used simulation to analyze the spatial locality of the translated application. Our results show that when executing an application under the control of Pin or DynamoRIO, the icache miss counts actually increase over 2X. Surprisingly, the L2 cache and the L1 data cache show a much lower performance degradation or even break even with the native application. We also found that overall performance degradations are due to the instructions added by the DBT itself, and that these extra instructions outweigh any possible spatial locality benefits exhibited in the code cache. Our observations held regardless of the trace length, code cache size, or the presence of a hardware trace cache. These results provide a better understanding of the efficiency of current instrumentation tools and their effects on instruction/trace cache performance and other structures of the microarchitecture.","PeriodicalId":447179,"journal":{"name":"2008 IEEE International Symposium on Workload Characterization","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 IEEE International Symposium on Workload Characterization","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IISWC.2008.4636098","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 14

Abstract

Dynamic binary translation systems enable a wide range of applications such as program instrumentation, optimization, and security. DBTs use a software code cache to store previously translated instructions. The code layout in the code cache greatly differs from the code layout of the original program. This paper provides an exhaustive analysis of the performance of the instruction/trace cache and other structures of the micro-architecture while executing DBTs that focus on program instrumentation, such as DynamoRIO and Pin. We performed our evaluation along two axes. First, we directly accessed the hardware performance counters to determine actual cache miss counts. Second, we used simulation to analyze the spatial locality of the translated application. Our results show that when executing an application under the control of Pin or DynamoRIO, the icache miss counts actually increase over 2X. Surprisingly, the L2 cache and the L1 data cache show a much lower performance degradation or even break even with the native application. We also found that overall performance degradations are due to the instructions added by the DBT itself, and that these extra instructions outweigh any possible spatial locality benefits exhibited in the code cache. Our observations held regardless of the trace length, code cache size, or the presence of a hardware trace cache. These results provide a better understanding of the efficiency of current instrumentation tools and their effects on instruction/trace cache performance and other structures of the microarchitecture.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
评估动态二进制转换系统对硬件缓存性能的影响
动态二进制翻译系统支持广泛的应用,如程序检测、优化和安全性。dbt使用软件代码缓存来存储以前翻译过的指令。代码缓存中的代码布局与原始程序的代码布局有很大的不同。本文详尽地分析了指令/跟踪缓存和微架构的其他结构在执行dbt时的性能,这些dbt侧重于程序插接,如DynamoRIO和Pin。我们沿着两个轴进行计算。首先,我们直接访问硬件性能计数器以确定实际的缓存丢失计数。其次,我们使用仿真分析翻译应用程序的空间局部性。我们的结果表明,当在Pin或DynamoRIO控制下执行应用程序时,icache miss计数实际上增加了2倍以上。令人惊讶的是,L2缓存和L1数据缓存表现出更低的性能下降,甚至与本地应用程序持平。我们还发现,总体性能下降是由于DBT本身添加的指令造成的,这些额外的指令超过了代码缓存中显示的任何可能的空间局部性优势。我们的观察结果与跟踪长度、代码缓存大小或硬件跟踪缓存的存在无关。这些结果有助于更好地理解当前检测工具的效率及其对指令/跟踪缓存性能和微体系结构的其他结构的影响。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Workload characterization of selected JEE-based Web 2.0 applications Accelerating multi-core processor design space evaluation using automatic multi-threaded workload synthesis Evaluating the impact of dynamic binary translation systems on hardware cache performance On the representativeness of embedded Java benchmarks A workload for evaluating deep packet inspection architectures
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1