Near-Memory Address Translation

2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT) Pub Date : 2016-12-01 DOI:10.1109/PACT.2017.56

Javier Picorel, Djordje Jevdjic, B. Falsafi

{"title":"Near-Memory Address Translation","authors":"Javier Picorel, Djordje Jevdjic, B. Falsafi","doi":"10.1109/PACT.2017.56","DOIUrl":null,"url":null,"abstract":"Memory and logic integration on the same chip is becoming increasingly cost effective, creating the opportunity to offload data-intensive functionality to processing units placed inside memory chips. The introduction of memory-side processing units (MPUs) into conventional systems faces virtual memory as the first big showstopper: without efficient hardware support for address translation MPUs have highly limited applicability. Unfortunately, conventional translation mechanisms fall short of providing fast translations as contemporary memories exceed the reach of TLBs, making expensive page walks common.In this paper, we are the first to show that the historically important flexibility to map any virtual page to any page frame is unnecessary in today's servers. We find that while limiting the associativity of the virtual-to-physical mapping incurs no penalty, it can break the translate-then-fetch serialization if combined with careful data placement in the MPU's memory, allowing for translation and data fetch to proceed independently and in parallel. We propose the Distributed Inverted Page Table (DIPTA), a near-memory structure in which the smallest memory partition keeps the translation information for its data share, ensuring that the translation completes together with the data fetch. DIPTA completely eliminates the performance overhead of translation, achieving speedups of up to 3.81x and 2.13x over conventional translation using 4KB and 1GB pages respectively.","PeriodicalId":438103,"journal":{"name":"2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"26","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PACT.2017.56","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 26

Abstract

Memory and logic integration on the same chip is becoming increasingly cost effective, creating the opportunity to offload data-intensive functionality to processing units placed inside memory chips. The introduction of memory-side processing units (MPUs) into conventional systems faces virtual memory as the first big showstopper: without efficient hardware support for address translation MPUs have highly limited applicability. Unfortunately, conventional translation mechanisms fall short of providing fast translations as contemporary memories exceed the reach of TLBs, making expensive page walks common.In this paper, we are the first to show that the historically important flexibility to map any virtual page to any page frame is unnecessary in today's servers. We find that while limiting the associativity of the virtual-to-physical mapping incurs no penalty, it can break the translate-then-fetch serialization if combined with careful data placement in the MPU's memory, allowing for translation and data fetch to proceed independently and in parallel. We propose the Distributed Inverted Page Table (DIPTA), a near-memory structure in which the smallest memory partition keeps the translation information for its data share, ensuring that the translation completes together with the data fetch. DIPTA completely eliminates the performance overhead of translation, achieving speedups of up to 3.81x and 2.13x over conventional translation using 4KB and 1GB pages respectively.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

近内存地址转换

同一芯片上的内存和逻辑集成正变得越来越具有成本效益，这为将数据密集型功能卸载到存储芯片内的处理单元创造了机会。在传统系统中引入内存端处理单元(mpu)时，虚拟内存是第一个大问题:没有有效的硬件支持地址转换，mpu的适用性非常有限。不幸的是，传统的翻译机制无法提供快速翻译，因为当代存储器超出了tlb的能力范围，使得昂贵的页遍历变得常见。在本文中，我们首先表明，将任何虚拟页面映射到任何页面框架的历史上重要的灵活性在今天的服务器中是不必要的。我们发现，虽然限制虚拟到物理映射的关联性不会带来任何损失，但如果与MPU内存中的仔细数据放置相结合，它可能会破坏翻译-然后获取的序列化，允许翻译和数据获取独立并行地进行。我们提出了分布式倒页表(DIPTA)，这是一种近内存结构，其中最小的内存分区保留其数据共享的翻译信息，确保翻译与数据获取一起完成。DIPTA完全消除了翻译的性能开销，与使用4KB和1GB页面的传统翻译相比，实现了高达3.81倍和2.13倍的速度提升。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT)

自引率

0.00%

发文量