Handling the problems and opportunities posed by multiple on-chip memory controllers

2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT) Pub Date : 2010-09-11 DOI:10.1145/1854273.1854314

M. Awasthi, D. Nellans, K. Sudan, R. Balasubramonian, A. Davis

{"title":"Handling the problems and opportunities posed by multiple on-chip memory controllers","authors":"M. Awasthi, D. Nellans, K. Sudan, R. Balasubramonian, A. Davis","doi":"10.1145/1854273.1854314","DOIUrl":null,"url":null,"abstract":"Modern processors such as Tilera's Tile64, Intel's Nehalem, and AMD's Opteron are migrating memory controllers (MCs) on-chip, while maintaining a large, at memory address space. This trend to utilize multiple MCs will likely continue and a core or socket will consequently need to route memory requests to the appropriate MC via an inter- or intra-socket interconnect fabric similar to AMD's HyperTransport™, or Intel's Quick-Path Interconnect™. Such systems are therefore subject to non-uniform memory access (NUMA) latencies because of the time spent traveling to remote MCs. Each MC will act as the gateway to a particular piece of the physical memory. Data placement will therefore become increasingly critical in minimizing memory access latencies. To date, no prior work has examined the effects of data placement among multiple MCs in such systems. Future chip-multiprocessors are likely to comprise multiple MCs and an even larger number of cores. This trend will increase the memory access latency variation in these systems. Proper allocation of workload data to the appropriate MC will be important in reducing the latency of memory service requests. The allocation strategy will need to be aware of queuing delays, on-chip latencies, and row-buffer hit-rates for each MC. In this paper, we propose dynamic mechanisms that take these factors into account when placing data in appropriate slices of the physical memory. We introduce adaptive first-touch page placement, and dynamic page-migration mechanisms to reduce DRAM access delays for multi-MC systems. These policies yield average performance improvements of 17% for adaptive first-touch page-placement, and 35% for a dynamic page-migration policy.","PeriodicalId":422461,"journal":{"name":"2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"135","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1854273.1854314","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 135

Abstract

Modern processors such as Tilera's Tile64, Intel's Nehalem, and AMD's Opteron are migrating memory controllers (MCs) on-chip, while maintaining a large, at memory address space. This trend to utilize multiple MCs will likely continue and a core or socket will consequently need to route memory requests to the appropriate MC via an inter- or intra-socket interconnect fabric similar to AMD's HyperTransport™, or Intel's Quick-Path Interconnect™. Such systems are therefore subject to non-uniform memory access (NUMA) latencies because of the time spent traveling to remote MCs. Each MC will act as the gateway to a particular piece of the physical memory. Data placement will therefore become increasingly critical in minimizing memory access latencies. To date, no prior work has examined the effects of data placement among multiple MCs in such systems. Future chip-multiprocessors are likely to comprise multiple MCs and an even larger number of cores. This trend will increase the memory access latency variation in these systems. Proper allocation of workload data to the appropriate MC will be important in reducing the latency of memory service requests. The allocation strategy will need to be aware of queuing delays, on-chip latencies, and row-buffer hit-rates for each MC. In this paper, we propose dynamic mechanisms that take these factors into account when placing data in appropriate slices of the physical memory. We introduce adaptive first-touch page placement, and dynamic page-migration mechanisms to reduce DRAM access delays for multi-MC systems. These policies yield average performance improvements of 17% for adaptive first-touch page-placement, and 35% for a dynamic page-migration policy.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

处理多个片上存储器控制器带来的问题和机会

现代处理器，如Tilera的Tile64、Intel的Nehalem和AMD的Opteron，正在将内存控制器(mc)迁移到片上，同时保持一个大的内存地址空间。这种利用多个MC的趋势可能会继续下去，因此，一个核心或插槽将需要通过类似于AMD的HyperTransport™或英特尔的快速路径互连™的套接字间或套接字内互连结构，将内存请求路由到适当的MC。这样的系统因此受到非统一内存访问(NUMA)延迟的影响，因为传输到远程mc所花费的时间。每个MC将充当通往特定物理内存块的网关。因此，在最小化内存访问延迟方面，数据放置将变得越来越重要。到目前为止，还没有先前的工作研究了在这种系统中多个mc之间放置数据的影响。未来的芯片多处理器可能包括多个mc和更大数量的核心。这种趋势将增加这些系统中的内存访问延迟变化。将工作负载数据适当地分配给适当的MC对于减少内存服务请求的延迟非常重要。分配策略需要考虑每个MC的排队延迟、片上延迟和行缓冲区命中率。在本文中，我们提出了在将数据放置在适当的物理内存片中时考虑这些因素的动态机制。我们引入自适应的首次触摸页面放置和动态页面迁移机制，以减少多mc系统的DRAM访问延迟。对于自适应首次触摸页面放置，这些策略的平均性能提高了17%，对于动态页面迁移策略，性能提高了35%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT)

自引率

0.00%

发文量

期刊最新文献

Reducing task creation and termination overhead in explicitly parallel programs An intra-tile cache set balancing scheme NUcache: A multicore cache organization based on Next-Use distance Towards a science of parallel programming Discovering and understanding performance bottlenecks in transactional applications