BuMP: Bulk Memory Access Prediction and Streaming

2014 47th Annual IEEE/ACM International Symposium on Microarchitecture Pub Date : 2014-12-13 DOI:10.1109/MICRO.2014.44

Stavros Volos, Javier Picorel, B. Falsafi, Boris Grot

{"title":"BuMP: Bulk Memory Access Prediction and Streaming","authors":"Stavros Volos, Javier Picorel, B. Falsafi, Boris Grot","doi":"10.1109/MICRO.2014.44","DOIUrl":null,"url":null,"abstract":"With the end of Den nard scaling, server power has emerged as the limiting factor in the quest for more capable data enters. Without the benefit of supply voltage scaling, it is essential to lower the energy per operation to improve server efficiency. As the industry moves to lean-core server processors, the energy bottleneck is shifting toward main memory as a chief source of server energy consumption in modern data enters. Maximizing the energy efficiency of today's DRAM chips and interfaces requires amortizing the costly DRAM page activations over multiple row buffer accesses. This work introduces Bulk Memory Access Prediction and Streaming, or BuMP. We make the observation that a significant fraction (59-79%) of all memory accesses fall into DRAM pages with high access density, meaning that the majority of their cache blocks will be accessed within a modest time frame of the first access. Accesses to high-density DRAM pages include not only memory reads in response to load instructions, but also reads stemming from store instructions as well as memory writes upon a dirty LLC eviction. The remaining accesses go to low-density pages and virtually unpredictable reference patterns (e.g., Hashed key lookups). BuMP employs a low-cost predictor to identify high-density pages and triggers bulk transfer operations upon the first read or write to the page. In doing so, BuMP enforces high row buffer locality where it is profitable, thereby reducing DRAM energy per access by 23%, and improves server throughput by 11% across a wide range of server applications.","PeriodicalId":6591,"journal":{"name":"2014 47th Annual IEEE/ACM International Symposium on Microarchitecture","volume":"73 1","pages":"545-557"},"PeriodicalIF":0.0000,"publicationDate":"2014-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"38","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 47th Annual IEEE/ACM International Symposium on Microarchitecture","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MICRO.2014.44","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 38

Abstract

With the end of Den nard scaling, server power has emerged as the limiting factor in the quest for more capable data enters. Without the benefit of supply voltage scaling, it is essential to lower the energy per operation to improve server efficiency. As the industry moves to lean-core server processors, the energy bottleneck is shifting toward main memory as a chief source of server energy consumption in modern data enters. Maximizing the energy efficiency of today's DRAM chips and interfaces requires amortizing the costly DRAM page activations over multiple row buffer accesses. This work introduces Bulk Memory Access Prediction and Streaming, or BuMP. We make the observation that a significant fraction (59-79%) of all memory accesses fall into DRAM pages with high access density, meaning that the majority of their cache blocks will be accessed within a modest time frame of the first access. Accesses to high-density DRAM pages include not only memory reads in response to load instructions, but also reads stemming from store instructions as well as memory writes upon a dirty LLC eviction. The remaining accesses go to low-density pages and virtually unpredictable reference patterns (e.g., Hashed key lookups). BuMP employs a low-cost predictor to identify high-density pages and triggers bulk transfer operations upon the first read or write to the page. In doing so, BuMP enforces high row buffer locality where it is profitable, thereby reducing DRAM energy per access by 23%, and improves server throughput by 11% across a wide range of server applications.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

BuMP:批量内存访问预测和流

随着Den - nard扩展的结束，服务器功率已经成为寻求更强大的数据输入的限制因素。如果没有电源电压缩放的好处，就必须降低每次操作的能量，以提高服务器效率。随着行业转向精益核心服务器处理器，能源瓶颈正在向主存储器转移，主存储器是现代数据中心服务器能源消耗的主要来源。当前的DRAM芯片和接口的能源效率最大化需要在多个行缓冲区访问上分摊昂贵的DRAM页激活。这项工作引入了批量内存访问预测和流，或BuMP。我们观察到，所有内存访问中有很大一部分(59-79%)落入具有高访问密度的DRAM页面，这意味着它们的大部分缓存块将在第一次访问的适当时间框架内被访问。对高密度DRAM页面的访问不仅包括响应加载指令的内存读取，还包括来自存储指令的读取以及在dirty LLC驱逐时的内存写入。其余的访问将转到低密度页面和几乎不可预测的引用模式(例如，散列键查找)。BuMP使用低成本的预测器来识别高密度页面，并在第一次读取或写入页面时触发批量传输操作。在这样做的过程中，BuMP在有利可图的情况下强制执行高行缓冲区局部性，从而将每次访问的DRAM能量减少23%，并在广泛的服务器应用程序中将服务器吞吐量提高11%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2014 47th Annual IEEE/ACM International Symposium on Microarchitecture

自引率

0.00%

发文量