Graph-matching-based simulation-region selection for multiple binaries

Charles R. Yount, H. Patil, M. S. Islam, Aditya Srikanth
{"title":"Graph-matching-based simulation-region selection for multiple binaries","authors":"Charles R. Yount, H. Patil, M. S. Islam, Aditya Srikanth","doi":"10.1109/ISPASS.2015.7095784","DOIUrl":null,"url":null,"abstract":"Comparison of simulation-based performance estimates of program binaries built with different compiler settings or targeted at variants of an instruction set architecture is essential for software/hardware co-design and similar engineering activities. Commonly-used sampling techniques for selecting simulation regions do not ensure that samples from the various binaries being compared represent the same source-level work, leading to biased speedup estimates and difficulty in comparative performance debugging. The task of creating equal-work samples is made difficult by differences between the structure and execution paths across multiple binaries such as variations in libraries, in-lining, and loop-iteration counts. Such complexities are addressed in this work by first applying an existing graph-matching technique to call and loop graphs for multiple binaries for the same source program. Then, a new sequence-alignment algorithm is applied to execution traces from the various binaries, using the graph-matching results to define intervals of equal work. A basic-block profile generated for these matched intervals can then be used for phase-detection and simulation-region selection across all binaries simultaneously. The resulting selected simulation regions match both in number and the work done across multiple binaries. The application of this technique is demonstrated on binaries compiled for different Intel 64 Architecture instruction-set extensions. Quality metrics for speedup estimation and an example of applying the data for performance debugging are presented.","PeriodicalId":189378,"journal":{"name":"2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISPASS.2015.7095784","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7

Abstract

Comparison of simulation-based performance estimates of program binaries built with different compiler settings or targeted at variants of an instruction set architecture is essential for software/hardware co-design and similar engineering activities. Commonly-used sampling techniques for selecting simulation regions do not ensure that samples from the various binaries being compared represent the same source-level work, leading to biased speedup estimates and difficulty in comparative performance debugging. The task of creating equal-work samples is made difficult by differences between the structure and execution paths across multiple binaries such as variations in libraries, in-lining, and loop-iteration counts. Such complexities are addressed in this work by first applying an existing graph-matching technique to call and loop graphs for multiple binaries for the same source program. Then, a new sequence-alignment algorithm is applied to execution traces from the various binaries, using the graph-matching results to define intervals of equal work. A basic-block profile generated for these matched intervals can then be used for phase-detection and simulation-region selection across all binaries simultaneously. The resulting selected simulation regions match both in number and the work done across multiple binaries. The application of this technique is demonstrated on binaries compiled for different Intel 64 Architecture instruction-set extensions. Quality metrics for speedup estimation and an example of applying the data for performance debugging are presented.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于图匹配的多二进制模拟区域选择
使用不同的编译器设置或针对指令集体系结构的变体构建的程序二进制文件的基于仿真的性能估计的比较对于软件/硬件协同设计和类似的工程活动是必不可少的。用于选择模拟区域的常用采样技术并不能确保来自被比较的各种二进制文件的样本代表相同的源级工作,从而导致有偏差的加速估计和比较性能调试的困难。由于跨多个二进制文件的结构和执行路径的差异,例如库、内联和循环迭代计数的变化,使得创建相同工作示例的任务变得困难。本文首先应用现有的图匹配技术,为同一源程序的多个二进制文件调用和循环图,从而解决了这种复杂性。然后,将一种新的序列对齐算法应用于各种二进制文件的执行轨迹,使用图匹配结果来定义相等工作的间隔。为这些匹配区间生成的基本块剖面可以同时用于所有二进制的相位检测和模拟区域选择。结果选择的模拟区域在数量和跨多个二进制文件完成的工作上都是匹配的。在针对不同的Intel 64架构指令集扩展编译的二进制文件上演示了该技术的应用。给出了用于加速估计的质量度量,并给出了应用这些数据进行性能调试的实例。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Graph Processing Platforms at Scale: Practices and Experiences Self-monitoring overhead of the Linux perf_ event performance counter interface Analyzing communication models for distributed thread-collaborative processors in terms of energy and time A full-system approach to analyze the impact of next-generation mobile flash storage Graph-matching-based simulation-region selection for multiple binaries
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1