Effectiveness of hardware-based stride and sequential prefetching in shared-memory multiprocessors

Proceedings of 1995 1st IEEE Symposium on High Performance Computer Architecture Pub Date : 1995-01-22 DOI:10.1109/HPCA.1995.386554

F. Dahlgren, P. Stenström

引用次数: 61

Abstract

We study the relative efficiency of previously proposed stride and sequential prefetching-two promising hardware-based prefetching schemes to reduce read-miss penalties in shared-memory multiprocessors. Although stride accesses dominate in four out of six of the applications we study, we find that sequential prefetching does better than stride prefetching for three applications. This is because (i) most strides are shorter than the block size (we assume 32 byte blocks), which means that sequential prefetching is as effective for stride accesses, and (ii) sequential prefetching also exploits the locality of read misses for non-stride accesses. However we find that since stride prefetching causes fewer useless prefetches, it consumes less memory-system bandwidth.<>

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

共享内存多处理器中基于硬件的步进和顺序预取的有效性

我们研究了先前提出的跨步预取和顺序预取的相对效率，这两种有前途的基于硬件的预取方案可以减少共享内存多处理器中的读缺失损失。尽管在我们研究的6个应用程序中，有4个应用程序采用跨步访问，但我们发现顺序预取在3个应用程序中优于跨步预取。这是因为(i)大多数步进都比块大小短(我们假设32字节块)，这意味着顺序预取对于步进访问同样有效，并且(ii)顺序预取还利用了非步进访问的读丢失的局部性。然而，我们发现，由于步幅预取导致更少的无用预取，它消耗更少的内存系统带宽。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of 1995 1st IEEE Symposium on High Performance Computer Architecture

自引率

0.00%

发文量

期刊最新文献

Software assistance for data caches Origin-based fault-tolerant routing in the mesh The Named-State Register File: implementation and performance Access ordering and memory-conscious cache utilization Fast barrier synchronization in wormhole k-ary n-cube networks with multidestination worms