Synchrotrace: synchronization-aware architecture-agnostic traces for light-weight multicore simulation

2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) Pub Date : 2015-03-29 DOI:10.1109/ISPASS.2015.7095813

Siddharth Nilakantan, K. Sangaiah, A. More, G. Salvador, B. Taskin, Mark Hempstead

{"title":"Synchrotrace: synchronization-aware architecture-agnostic traces for light-weight multicore simulation","authors":"Siddharth Nilakantan, K. Sangaiah, A. More, G. Salvador, B. Taskin, Mark Hempstead","doi":"10.1109/ISPASS.2015.7095813","DOIUrl":null,"url":null,"abstract":"Trace-driven simulation of chip multiprocessor (CMP) systems offers many advantages over execution-driven simulation, such as reducing simulation time and complexity, and allowing portability, and scalability. However, trace-based simulation approaches have encountered difficulty capturing and accurately replaying multi-threaded traces due to the inherent non-determinism in the execution of multi-threaded programs. In this work, we present SynchroTrace, a scalable, flexible, and accurate trace-based multi-threaded simulation methodology. The methodology captures synchronization- and dependency-aware, architecture-agnostic, multi-threaded traces and uses a replay mechanism that plays back these traces correctly. By recording synchronization events and dependencies in the traces, independent of the host architecture, the methodology is able to accurately model the non-determinism of multi-threaded programs for different platforms. We validate the SynchroTrace simulation flow by successfully achieving the equivalent results of a constraint-based design space exploration with the Gem5 Full-System simulator. The results from simulating benchmarks from PARSEC 2.1 and Splash-2 show that our trace-based approach with trace filtering has a peak speedup of up to 18.4x over simulation in Gem5 Full-System with an average of about 7.5x speedup. We are also able to compress traces up to 74% of their original size with almost no impact on accuracy.","PeriodicalId":189378,"journal":{"name":"2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)","volume":"94 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"22","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISPASS.2015.7095813","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 22

Abstract

Trace-driven simulation of chip multiprocessor (CMP) systems offers many advantages over execution-driven simulation, such as reducing simulation time and complexity, and allowing portability, and scalability. However, trace-based simulation approaches have encountered difficulty capturing and accurately replaying multi-threaded traces due to the inherent non-determinism in the execution of multi-threaded programs. In this work, we present SynchroTrace, a scalable, flexible, and accurate trace-based multi-threaded simulation methodology. The methodology captures synchronization- and dependency-aware, architecture-agnostic, multi-threaded traces and uses a replay mechanism that plays back these traces correctly. By recording synchronization events and dependencies in the traces, independent of the host architecture, the methodology is able to accurately model the non-determinism of multi-threaded programs for different platforms. We validate the SynchroTrace simulation flow by successfully achieving the equivalent results of a constraint-based design space exploration with the Gem5 Full-System simulator. The results from simulating benchmarks from PARSEC 2.1 and Splash-2 show that our trace-based approach with trace filtering has a peak speedup of up to 18.4x over simulation in Gem5 Full-System with an average of about 7.5x speedup. We are also able to compress traces up to 74% of their original size with almost no impact on accuracy.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Synchrotrace:用于轻量级多核仿真的同步感知体系结构不可知跟踪

芯片多处理器(CMP)系统的跟踪驱动仿真与执行驱动仿真相比具有许多优点，例如减少仿真时间和复杂性，并允许可移植性和可伸缩性。然而，由于多线程程序执行中固有的不确定性，基于跟踪的模拟方法在捕获和准确地重放多线程跟踪时遇到了困难。在这项工作中，我们提出了SynchroTrace，一个可扩展的，灵活的，准确的基于跟踪的多线程仿真方法。该方法捕获同步和依赖关系感知、体系结构不可知、多线程跟踪，并使用正确回放这些跟踪的重放机制。通过在跟踪中记录同步事件和依赖关系，独立于主机体系结构，该方法能够准确地为不同平台的多线程程序的非确定性建模。我们通过Gem5全系统模拟器成功实现基于约束的设计空间探索的等效结果来验证SynchroTrace仿真流程。PARSEC 2.1和Splash-2的模拟基准测试结果表明，与Gem5 Full-System模拟相比，我们基于跟踪滤波的方法具有高达18.4倍的峰值加速，平均加速约为7.5倍。我们还能够将痕迹压缩到原始尺寸的74%，几乎不会影响精度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)

自引率

0.00%

发文量