OptiWISE: Combining Sampling and Instrumentation for Granular CPI Analysis

Yuxin Guo, Alex W. Chadwick, Márton Erdős, Utpal Bora, Ilias Vougioukas, Giacomo Gabrielli, Timothy M. Jones
{"title":"OptiWISE: Combining Sampling and Instrumentation for Granular CPI Analysis","authors":"Yuxin Guo, Alex W. Chadwick, Márton Erdős, Utpal Bora, Ilias Vougioukas, Giacomo Gabrielli, Timothy M. Jones","doi":"10.1109/CGO57630.2024.10444771","DOIUrl":null,"url":null,"abstract":"Despite decades of improvement in compiler technology, it remains necessary to profile applications to improve performance. Existing profiling tools typically either sample hardware performance counters or instrument the program with extra instructions to analyze its execution. Both techniques are valuable with different strengths and weaknesses, but do not always correctly identify optimization opportunities. We present OPTIWISE, a profiling tool that runs the program twice, once with low-overhead sampling to accurately measure performance, and once with instrumentation to accurately capture control flow and execution counts. OPTIWISE then combines this information to give a highly detailed per-instruction CPI metric by computing the ratio of samples to execution counts, as well as aggregated information such as costs per loop, source-code line, or function. We evaluate OPTIWISE to show it has an overhead of 8.1× geomean, and 57× worst case on SPEC CPU2017 benchmarks. Using OPTIWISE, we present case studies of optimizing selected SPEC benchmarks on a modern x86 server processor. The per-instruction CPI metrics quickly reveal problems such as costly mispredicted branches and cache misses, which we use to manually optimize for effective performance improvements.","PeriodicalId":517814,"journal":{"name":"2024 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)","volume":"57 9","pages":"373-385"},"PeriodicalIF":0.0000,"publicationDate":"2024-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2024 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CGO57630.2024.10444771","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Despite decades of improvement in compiler technology, it remains necessary to profile applications to improve performance. Existing profiling tools typically either sample hardware performance counters or instrument the program with extra instructions to analyze its execution. Both techniques are valuable with different strengths and weaknesses, but do not always correctly identify optimization opportunities. We present OPTIWISE, a profiling tool that runs the program twice, once with low-overhead sampling to accurately measure performance, and once with instrumentation to accurately capture control flow and execution counts. OPTIWISE then combines this information to give a highly detailed per-instruction CPI metric by computing the ratio of samples to execution counts, as well as aggregated information such as costs per loop, source-code line, or function. We evaluate OPTIWISE to show it has an overhead of 8.1× geomean, and 57× worst case on SPEC CPU2017 benchmarks. Using OPTIWISE, we present case studies of optimizing selected SPEC benchmarks on a modern x86 server processor. The per-instruction CPI metrics quickly reveal problems such as costly mispredicted branches and cache misses, which we use to manually optimize for effective performance improvements.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
OptiWISE:将采样与仪器相结合,进行粒度 CPI 分析
尽管数十年来编译器技术不断进步,但仍有必要对应用程序进行剖析以提高性能。现有的剖析工具通常要么对硬件性能计数器进行采样,要么使用额外指令对程序进行检测,以分析其执行情况。这两种技术各有优缺点,但并不总能正确识别优化机会。我们介绍的 OPTIWISE 是一种剖析工具,它可以运行程序两次,一次使用低开销采样来精确测量性能,另一次使用仪器来精确捕捉控制流和执行计数。然后,OPTIWISE 将这些信息结合起来,通过计算采样与执行次数的比率,以及诸如每个循环、源代码行或函数的成本等汇总信息,给出高度详细的每条指令 CPI 指标。我们对 OPTIWISE 进行了评估,结果表明它在 SPEC CPU2017 基准上的开销为 8.1× geomean,最坏情况为 57×。利用 OPTIWISE,我们介绍了在现代 x86 服务器处理器上优化所选 SPEC 基准的案例研究。每条指令的 CPI 指标能迅速揭示问题,如代价高昂的错误预测分支和高速缓存缺失,我们利用这些指标进行手动优化,从而有效提高性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
PresCount: Effective Register Allocation for Bank Conflict Reduction High-Throughput, Formal-Methods-Assisted Fuzzing for LLVM CGO 2024 Organization SCHEMATIC: Compile-Time Checkpoint Placement and Memory Allocation for Intermittent Systems Representing Data Collections in an SSA Form
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1