Eliminating squashes through learning cross-thread violations in speculative parallelization for multiprocessors

Proceedings Eighth International Symposium on High Performance Computer Architecture Pub Date : 2002-02-02 DOI:10.1109/HPCA.2002.995697

Marcelo H. Cintra, J. Torrellas

{"title":"Eliminating squashes through learning cross-thread violations in speculative parallelization for multiprocessors","authors":"Marcelo H. Cintra, J. Torrellas","doi":"10.1109/HPCA.2002.995697","DOIUrl":null,"url":null,"abstract":"With speculative thread-level parallelization, codes that cannot be fully compiler-analyzed are aggressively executed in parallel. If the hardware detects a cross-thread dependence violation, it squashes offending threads and resumes execution. Unfortunately, frequent squashing cripples performance. This paper proposes a new framework of hardware mechanisms to eliminate most squashes due to data dependences in multiprocessors. The framework works by learning and predicting violations, and applying delayed-disambiguation, value prediction, and stall and release. The framework is suited for directory-based multiprocessors that track memory accesses at the system level with the coarse granularity of memory lines. Simulations of a 16-processor machine show that the framework is very effective. By adding our framework to a speculative CC-NUMA with 64-byte memory lines, we speed-up applications by an average of 4.3 times. Moreover, the resulting system is even 23% faster than a machine that tracks memory accesses at the fine granularity of words-a sophisticated system that is not compatible with mainstream cache coherence protocols.","PeriodicalId":408620,"journal":{"name":"Proceedings Eighth International Symposium on High Performance Computer Architecture","volume":"332 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2002-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"90","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings Eighth International Symposium on High Performance Computer Architecture","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCA.2002.995697","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 90

Abstract

With speculative thread-level parallelization, codes that cannot be fully compiler-analyzed are aggressively executed in parallel. If the hardware detects a cross-thread dependence violation, it squashes offending threads and resumes execution. Unfortunately, frequent squashing cripples performance. This paper proposes a new framework of hardware mechanisms to eliminate most squashes due to data dependences in multiprocessors. The framework works by learning and predicting violations, and applying delayed-disambiguation, value prediction, and stall and release. The framework is suited for directory-based multiprocessors that track memory accesses at the system level with the coarse granularity of memory lines. Simulations of a 16-processor machine show that the framework is very effective. By adding our framework to a speculative CC-NUMA with 64-byte memory lines, we speed-up applications by an average of 4.3 times. Moreover, the resulting system is even 23% faster than a machine that tracks memory accesses at the fine granularity of words-a sophisticated system that is not compatible with mainstream cache coherence protocols.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

通过学习多处理器推测并行化中的跨线程违规来消除挤压

通过推测线程级并行化，无法完全被编译器分析的代码将被并行执行。如果硬件检测到跨线程依赖冲突，它会将违规线程压扁并恢复执行。不幸的是，频繁的挤压会削弱性能。本文提出了一种新的硬件机制框架，以消除多处理器中由于数据依赖而产生的大多数压扁现象。该框架通过学习和预测违规，并应用延迟消歧、值预测以及暂停和释放来工作。该框架适用于基于目录的多处理器，这些多处理器在系统级别以粗粒度的内存行跟踪内存访问。在一台16处理器机器上的仿真表明，该框架是非常有效的。通过将我们的框架添加到具有64字节内存行的推测CC-NUMA中，我们将应用程序的速度平均提高了4.3倍。此外，由此产生的系统甚至比按单词粒度跟踪内存访问的机器还要快23%——这是一种与主流缓存一致性协议不兼容的复杂系统。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings Eighth International Symposium on High Performance Computer Architecture

自引率

0.00%

发文量

期刊最新文献

Eliminating squashes through learning cross-thread violations in speculative parallelization for multiprocessors Tuning garbage collection in an embedded Java environment Power issues related to branch prediction Using internal redundant representations and limited bypass to support pipelined adders and register files Modeling value speculation