In-Memory Data Parallel Processor

Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems Pub Date : 2018-03-19 DOI:10.1145/3173162.3173171

Daichi Fujiki, S. Mahlke, R. Das

{"title":"In-Memory Data Parallel Processor","authors":"Daichi Fujiki, S. Mahlke, R. Das","doi":"10.1145/3173162.3173171","DOIUrl":null,"url":null,"abstract":"Recent developments in Non-Volatile Memories (NVMs) have opened up a new horizon for in-memory computing. Despite the significant performance gain offered by computational NVMs, previous works have relied on manual mapping of specialized kernels to the memory arrays, making it infeasible to execute more general workloads. We combat this problem by proposing a programmable in-memory processor architecture and data-parallel programming framework. The efficiency of the proposed in-memory processor comes from two sources: massive parallelism and reduction in data movement. A compact instruction set provides generalized computation capabilities for the memory array. The proposed programming framework seeks to leverage the underlying parallelism in the hardware by merging the concepts of data-flow and vector processing. To facilitate in-memory programming, we develop a compilation framework that takes a TensorFlow input and generates code for our in-memory processor. Our results demonstrate 7.5x speedup over a multi-core CPU server for a set of applications from Parsec and 763x speedup over a server-class GPU for a set of Rodinia benchmarks.","PeriodicalId":302876,"journal":{"name":"Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"114","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3173162.3173171","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 114

Abstract

Recent developments in Non-Volatile Memories (NVMs) have opened up a new horizon for in-memory computing. Despite the significant performance gain offered by computational NVMs, previous works have relied on manual mapping of specialized kernels to the memory arrays, making it infeasible to execute more general workloads. We combat this problem by proposing a programmable in-memory processor architecture and data-parallel programming framework. The efficiency of the proposed in-memory processor comes from two sources: massive parallelism and reduction in data movement. A compact instruction set provides generalized computation capabilities for the memory array. The proposed programming framework seeks to leverage the underlying parallelism in the hardware by merging the concepts of data-flow and vector processing. To facilitate in-memory programming, we develop a compilation framework that takes a TensorFlow input and generates code for our in-memory processor. Our results demonstrate 7.5x speedup over a multi-core CPU server for a set of applications from Parsec and 763x speedup over a server-class GPU for a set of Rodinia benchmarks.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

内存数据并行处理器

非易失性存储器(nvm)的最新发展为内存计算开辟了一个新的领域。尽管计算型nvm提供了显著的性能提升，但以前的工作依赖于手动将专门的内核映射到内存阵列，这使得执行更通用的工作负载变得不可行。我们通过提出一个可编程内存处理器架构和数据并行编程框架来解决这个问题。所建议的内存处理器的效率来自两个来源:大规模并行性和减少数据移动。紧凑的指令集为存储器阵列提供了通用的计算能力。所提出的编程框架试图通过合并数据流和矢量处理的概念来利用硬件中的底层并行性。为了方便内存编程，我们开发了一个编译框架，它接受TensorFlow输入，并为我们的内存处理器生成代码。我们的结果表明，对于一组来自Parsec的应用程序，在多核CPU服务器上的加速速度为7.5倍，对于一组Rodinia基准测试，在服务器级GPU上的加速速度为763x。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems

自引率

0.00%

发文量

期刊最新文献

CALOREE: Learning Control for Predictable Latency and Low Energy Session details: Session 7B: Memory 2 Session details: Session 4A: Memory 1 BranchScope: A New Side-Channel Attack on Directional Branch Predictor Devirtualizing Memory in Heterogeneous Systems