LRU-PEA:芯片多处理器上非统一缓存架构的智能替换策略

2009 IEEE International Conference on Computer Design Pub Date : 2009-10-04 DOI:10.1109/ICCD.2009.5413142

Javier Lira, Carlos Molina, Antonio González

{"title":"LRU-PEA:芯片多处理器上非统一缓存架构的智能替换策略","authors":"Javier Lira, Carlos Molina, Antonio González","doi":"10.1109/ICCD.2009.5413142","DOIUrl":null,"url":null,"abstract":"The increasing speed-gap between processor and memory and the limited memory bandwidth make last-level cache performance crucial for CMP architectures. non uniform cache architectures (NUCA) have been introduced to deal with this problem. This memory organization divides the whole memory space into smaller pieces or banks allowing nearer banks to have better access latencies than further banks. Moreover, an adaptive replacement policy that efficiently reduces misses in the last-level cache could boost performance, particularly if set associativity is adopted. Unfortunately, traditional replacement policies do not behave properly as they were designed for single-processors. This paper focuses on bank replacement. This policy involves three key decisions when there is a miss: where to place a data block within the cache set, which data to evict from the cache set and finally, where to place the evicted data. We propose a novel replacement technique that enables more intelligent replacement decisions to be taken. This technique is based on the observation that some types of data are less commonly accessed depending on which bank they reside in. We call this technique LRU-PEA (least recently used with a priority eviction approach). We show that the proposed technique significantly reduces the requests to the off-chip memory by increasing the hit ratio in the NUCA cache. This translates into an average IPC improvement of 8% and into an Energy per Instruction (EPI) reduction of 5%.","PeriodicalId":256908,"journal":{"name":"2009 IEEE International Conference on Computer Design","volume":"55 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":"{\"title\":\"LRU-PEA: A smart replacement policy for non-uniform cache architectures on chip multiprocessors\",\"authors\":\"Javier Lira, Carlos Molina, Antonio González\",\"doi\":\"10.1109/ICCD.2009.5413142\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The increasing speed-gap between processor and memory and the limited memory bandwidth make last-level cache performance crucial for CMP architectures. non uniform cache architectures (NUCA) have been introduced to deal with this problem. This memory organization divides the whole memory space into smaller pieces or banks allowing nearer banks to have better access latencies than further banks. Moreover, an adaptive replacement policy that efficiently reduces misses in the last-level cache could boost performance, particularly if set associativity is adopted. Unfortunately, traditional replacement policies do not behave properly as they were designed for single-processors. This paper focuses on bank replacement. This policy involves three key decisions when there is a miss: where to place a data block within the cache set, which data to evict from the cache set and finally, where to place the evicted data. We propose a novel replacement technique that enables more intelligent replacement decisions to be taken. This technique is based on the observation that some types of data are less commonly accessed depending on which bank they reside in. We call this technique LRU-PEA (least recently used with a priority eviction approach). We show that the proposed technique significantly reduces the requests to the off-chip memory by increasing the hit ratio in the NUCA cache. This translates into an average IPC improvement of 8% and into an Energy per Instruction (EPI) reduction of 5%.\",\"PeriodicalId\":256908,\"journal\":{\"name\":\"2009 IEEE International Conference on Computer Design\",\"volume\":\"55 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2009-10-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"13\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2009 IEEE International Conference on Computer Design\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCD.2009.5413142\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 IEEE International Conference on Computer Design","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCD.2009.5413142","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 13

摘要

处理器和内存之间越来越大的速度差距以及有限的内存带宽使得最后一级缓存性能对CMP体系结构至关重要。为了解决这个问题，引入了非统一缓存架构(NUCA)。这种内存组织将整个内存空间划分为更小的块或库，允许较近的库比较远的库具有更好的访问延迟。此外，有效减少最后一级缓存中的失误的自适应替换策略可以提高性能，特别是如果采用集合结合性。不幸的是，传统的替换策略不能正常工作，因为它们是为单处理器设计的。本文主要研究银行置换问题。当出现丢失时，此策略涉及三个关键决策:在缓存集中放置数据块的位置，从缓存集中取出哪些数据，以及将取出的数据放在何处。我们提出了一种新颖的替换技术，可以做出更智能的替换决策。该技术基于以下观察:某些类型的数据不太常被访问，这取决于它们位于哪个银行。我们称这种技术为LRU-PEA(最近与优先级驱逐方法一起使用最少)。我们表明，所提出的技术通过增加NUCA缓存中的命中率显着减少了对片外存储器的请求。这意味着IPC平均提高了8%，每条指令能量(EPI)降低了5%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

LRU-PEA: A smart replacement policy for non-uniform cache architectures on chip multiprocessors

The increasing speed-gap between processor and memory and the limited memory bandwidth make last-level cache performance crucial for CMP architectures. non uniform cache architectures (NUCA) have been introduced to deal with this problem. This memory organization divides the whole memory space into smaller pieces or banks allowing nearer banks to have better access latencies than further banks. Moreover, an adaptive replacement policy that efficiently reduces misses in the last-level cache could boost performance, particularly if set associativity is adopted. Unfortunately, traditional replacement policies do not behave properly as they were designed for single-processors. This paper focuses on bank replacement. This policy involves three key decisions when there is a miss: where to place a data block within the cache set, which data to evict from the cache set and finally, where to place the evicted data. We propose a novel replacement technique that enables more intelligent replacement decisions to be taken. This technique is based on the observation that some types of data are less commonly accessed depending on which bank they reside in. We call this technique LRU-PEA (least recently used with a priority eviction approach). We show that the proposed technique significantly reduces the requests to the off-chip memory by increasing the hit ratio in the NUCA cache. This translates into an average IPC improvement of 8% and into an Energy per Instruction (EPI) reduction of 5%.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2009 IEEE International Conference on Computer Design

自引率

0.00%

发文量

期刊最新文献

Empirical performance models for 3T1D memories A novel SoC architecture on FPGA for ultra fast face detection A Technology-Agnostic Simulation Environment (TASE) for iterative custom IC design across processes Low-overhead error detection for Networks-on-Chip Interconnect performance corners considering crosstalk noise