PIM GPT a hybrid process in memory accelerator for autoregressive transformers

Yuting Wu, Ziyu Wang, Wei D. Lu
{"title":"PIM GPT a hybrid process in memory accelerator for autoregressive transformers","authors":"Yuting Wu, Ziyu Wang, Wei D. Lu","doi":"10.1038/s44335-024-00004-2","DOIUrl":null,"url":null,"abstract":"Decoder-only Transformer models such as Generative Pre-trained Transformers (GPT) have demonstrated exceptional performance in text generation by autoregressively predicting the next token. However, the efficiency of running GPT on current hardware systems is bounded by low compute-to-memory-ratio and high memory access. In this work, we propose a Process-in-memory (PIM) GPT accelerator, PIM-GPT, which achieves end-to-end acceleration of GPT inference with high performance and high energy efficiency. PIM-GPT leverages DRAM-based PIM designs for executing multiply-accumulate (MAC) operations directly in the DRAM chips, eliminating the need to move matrix data off-chip. Non-linear functions and data communication are supported by an application specific integrated chip (ASIC). At the software level, mapping schemes are designed to maximize data locality and computation parallelism. Overall, PIM-GPT achieves 41 − 137 × , 631 − 1074 × speedup and 123 − 383 × , 320 − 602 × energy efficiency over GPU and CPU baseline on 8 GPT models with up to 1.4 billion parameters.","PeriodicalId":501715,"journal":{"name":"npj Unconventional Computing","volume":" ","pages":"1-13"},"PeriodicalIF":0.0000,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.nature.com/articles/s44335-024-00004-2.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"npj Unconventional Computing","FirstCategoryId":"1085","ListUrlMain":"https://www.nature.com/articles/s44335-024-00004-2","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Decoder-only Transformer models such as Generative Pre-trained Transformers (GPT) have demonstrated exceptional performance in text generation by autoregressively predicting the next token. However, the efficiency of running GPT on current hardware systems is bounded by low compute-to-memory-ratio and high memory access. In this work, we propose a Process-in-memory (PIM) GPT accelerator, PIM-GPT, which achieves end-to-end acceleration of GPT inference with high performance and high energy efficiency. PIM-GPT leverages DRAM-based PIM designs for executing multiply-accumulate (MAC) operations directly in the DRAM chips, eliminating the need to move matrix data off-chip. Non-linear functions and data communication are supported by an application specific integrated chip (ASIC). At the software level, mapping schemes are designed to maximize data locality and computation parallelism. Overall, PIM-GPT achieves 41 − 137 × , 631 − 1074 × speedup and 123 − 383 × , 320 − 602 × energy efficiency over GPU and CPU baseline on 8 GPT models with up to 1.4 billion parameters.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
PIM GPT 自回归变压器内存混合过程加速器
纯解码器变换器模型,如生成式预训练变换器(GPT),通过自回归预测下一个标记,在文本生成方面表现出了卓越的性能。然而,在当前的硬件系统上运行 GPT 的效率受限于低计算内存比和高内存访问。在这项工作中,我们提出了一种内存进程(PIM)GPT 加速器 PIM-GPT,它能以高性能和高能效实现 GPT 推理的端到端加速。PIM-GPT 利用基于 DRAM 的 PIM 设计,直接在 DRAM 芯片中执行乘积 (MAC) 运算,无需将矩阵数据移至芯片外。非线性功能和数据通信由专用集成芯片(ASIC)支持。在软件层面,映射方案旨在最大限度地提高数据局部性和计算并行性。总体而言,与 GPU 和 CPU 相比,PIM-GPT 在具有多达 14 亿个参数的 8 个 GPT 模型上实现了 41 - 137 ×、631 - 1074 × 的速度提升和 123 - 383 ×、320 - 602 × 的能效。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Thermodynamic linear algebra Efficient generation of grids and traversal graphs in compositional spaces towards exploration and path planning Demonstration of 4-quadrant analog in-memory matrix multiplication in a single modulation In-memory search with learning to hash based on resistive memory for recommendation acceleration A perfect storm and a new dawn for unconventional computing technologies
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1