Graph-matching-based simulation-region selection for multiple binaries

2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) Pub Date : 2015-03-29 DOI:10.1109/ISPASS.2015.7095784

Charles R. Yount, H. Patil, M. S. Islam, Aditya Srikanth

{"title":"Graph-matching-based simulation-region selection for multiple binaries","authors":"Charles R. Yount, H. Patil, M. S. Islam, Aditya Srikanth","doi":"10.1109/ISPASS.2015.7095784","DOIUrl":null,"url":null,"abstract":"Comparison of simulation-based performance estimates of program binaries built with different compiler settings or targeted at variants of an instruction set architecture is essential for software/hardware co-design and similar engineering activities. Commonly-used sampling techniques for selecting simulation regions do not ensure that samples from the various binaries being compared represent the same source-level work, leading to biased speedup estimates and difficulty in comparative performance debugging. The task of creating equal-work samples is made difficult by differences between the structure and execution paths across multiple binaries such as variations in libraries, in-lining, and loop-iteration counts. Such complexities are addressed in this work by first applying an existing graph-matching technique to call and loop graphs for multiple binaries for the same source program. Then, a new sequence-alignment algorithm is applied to execution traces from the various binaries, using the graph-matching results to define intervals of equal work. A basic-block profile generated for these matched intervals can then be used for phase-detection and simulation-region selection across all binaries simultaneously. The resulting selected simulation regions match both in number and the work done across multiple binaries. The application of this technique is demonstrated on binaries compiled for different Intel 64 Architecture instruction-set extensions. Quality metrics for speedup estimation and an example of applying the data for performance debugging are presented.","PeriodicalId":189378,"journal":{"name":"2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISPASS.2015.7095784","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

Abstract

Comparison of simulation-based performance estimates of program binaries built with different compiler settings or targeted at variants of an instruction set architecture is essential for software/hardware co-design and similar engineering activities. Commonly-used sampling techniques for selecting simulation regions do not ensure that samples from the various binaries being compared represent the same source-level work, leading to biased speedup estimates and difficulty in comparative performance debugging. The task of creating equal-work samples is made difficult by differences between the structure and execution paths across multiple binaries such as variations in libraries, in-lining, and loop-iteration counts. Such complexities are addressed in this work by first applying an existing graph-matching technique to call and loop graphs for multiple binaries for the same source program. Then, a new sequence-alignment algorithm is applied to execution traces from the various binaries, using the graph-matching results to define intervals of equal work. A basic-block profile generated for these matched intervals can then be used for phase-detection and simulation-region selection across all binaries simultaneously. The resulting selected simulation regions match both in number and the work done across multiple binaries. The application of this technique is demonstrated on binaries compiled for different Intel 64 Architecture instruction-set extensions. Quality metrics for speedup estimation and an example of applying the data for performance debugging are presented.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于图匹配的多二进制模拟区域选择

使用不同的编译器设置或针对指令集体系结构的变体构建的程序二进制文件的基于仿真的性能估计的比较对于软件/硬件协同设计和类似的工程活动是必不可少的。用于选择模拟区域的常用采样技术并不能确保来自被比较的各种二进制文件的样本代表相同的源级工作，从而导致有偏差的加速估计和比较性能调试的困难。由于跨多个二进制文件的结构和执行路径的差异，例如库、内联和循环迭代计数的变化，使得创建相同工作示例的任务变得困难。本文首先应用现有的图匹配技术，为同一源程序的多个二进制文件调用和循环图，从而解决了这种复杂性。然后，将一种新的序列对齐算法应用于各种二进制文件的执行轨迹，使用图匹配结果来定义相等工作的间隔。为这些匹配区间生成的基本块剖面可以同时用于所有二进制的相位检测和模拟区域选择。结果选择的模拟区域在数量和跨多个二进制文件完成的工作上都是匹配的。在针对不同的Intel 64架构指令集扩展编译的二进制文件上演示了该技术的应用。给出了用于加速估计的质量度量，并给出了应用这些数据进行性能调试的实例。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)

自引率

0.00%

发文量