H3D-Transformer：在边缘设备上加速变压器模型的异构三维（H3D）计算平台

IF 2 4区计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE ACM Transactions on Design Automation of Electronic Systems Pub Date : 2024-02-28 DOI:10.1145/3649219

Yandong Luo, Shimeng Yu

{"title":"H3D-Transformer：在边缘设备上加速变压器模型的异构三维（H3D）计算平台","authors":"Yandong Luo, Shimeng Yu","doi":"10.1145/3649219","DOIUrl":null,"url":null,"abstract":"<p>Prior hardware accelerator designs primarily focused on single-chip solutions for 10MB-class computer vision models. The GB-class transformer models for natural language processing (NLP) impose challenges on existing accelerator design due to the massive number of parameters and the diverse matrix multiplication (MatMul) workloads involved. This work proposes a heterogeneous 3D-based accelerator design for transformer models, which adopts an interposer substrate with multiple 3D memory/logic hybrid cubes optimized for accelerating different MatMul workloads. An approximate computing scheme is proposed to take advantage of heterogeneous computing paradigms of mixed-signal compute-in-memory (CIM) and digital tensor processing units (TPU). From the system-level evaluation results, 10 TOPS/W energy efficiency is achieved for the Bert and GPT2 model, which is about 2.6 × ∼ 3.1 × higher than the baseline with 7nm TPU and stacked FeFET memory.</p>","PeriodicalId":50944,"journal":{"name":"ACM Transactions on Design Automation of Electronic Systems","volume":"145 1","pages":""},"PeriodicalIF":2.0000,"publicationDate":"2024-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"H3D-Transformer: A Heterogeneous 3D (H3D) Computing Platform for Transformer Model Acceleration on Edge Devices\",\"authors\":\"Yandong Luo, Shimeng Yu\",\"doi\":\"10.1145/3649219\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Prior hardware accelerator designs primarily focused on single-chip solutions for 10MB-class computer vision models. The GB-class transformer models for natural language processing (NLP) impose challenges on existing accelerator design due to the massive number of parameters and the diverse matrix multiplication (MatMul) workloads involved. This work proposes a heterogeneous 3D-based accelerator design for transformer models, which adopts an interposer substrate with multiple 3D memory/logic hybrid cubes optimized for accelerating different MatMul workloads. An approximate computing scheme is proposed to take advantage of heterogeneous computing paradigms of mixed-signal compute-in-memory (CIM) and digital tensor processing units (TPU). From the system-level evaluation results, 10 TOPS/W energy efficiency is achieved for the Bert and GPT2 model, which is about 2.6 × ∼ 3.1 × higher than the baseline with 7nm TPU and stacked FeFET memory.</p>\",\"PeriodicalId\":50944,\"journal\":{\"name\":\"ACM Transactions on Design Automation of Electronic Systems\",\"volume\":\"145 1\",\"pages\":\"\"},\"PeriodicalIF\":2.0000,\"publicationDate\":\"2024-02-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Design Automation of Electronic Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1145/3649219\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Design Automation of Electronic Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3649219","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

摘要

先前的硬件加速器设计主要集中于 10MB 级计算机视觉模型的单芯片解决方案。用于自然语言处理（NLP）的 GB 级转换器模型由于涉及大量参数和多种矩阵乘法（MatMul）工作负载，给现有加速器设计带来了挑战。这项工作针对变换器模型提出了一种基于三维的异构加速器设计，它采用了带有多个三维内存/逻辑混合立方体的内插基板，并针对不同的 MatMul 工作负载进行了优化。利用混合信号内存计算（CIM）和数字张量处理单元（TPU）的异构计算范例，提出了一种近似计算方案。从系统级评估结果来看，Bert 和 GPT2 模型实现了 10 TOPS/W 的能效，比采用 7nm TPU 和堆叠 FeFET 内存的基准高出约 2.6 × ∼ 3.1 ×。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

H3D-Transformer: A Heterogeneous 3D (H3D) Computing Platform for Transformer Model Acceleration on Edge Devices

Prior hardware accelerator designs primarily focused on single-chip solutions for 10MB-class computer vision models. The GB-class transformer models for natural language processing (NLP) impose challenges on existing accelerator design due to the massive number of parameters and the diverse matrix multiplication (MatMul) workloads involved. This work proposes a heterogeneous 3D-based accelerator design for transformer models, which adopts an interposer substrate with multiple 3D memory/logic hybrid cubes optimized for accelerating different MatMul workloads. An approximate computing scheme is proposed to take advantage of heterogeneous computing paradigms of mixed-signal compute-in-memory (CIM) and digital tensor processing units (TPU). From the system-level evaluation results, 10 TOPS/W energy efficiency is achieved for the Bert and GPT2 model, which is about 2.6 × ∼ 3.1 × higher than the baseline with 7nm TPU and stacked FeFET memory.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ACM Transactions on Design Automation of Electronic Systems 工程技术-计算机：软件工程

CiteScore

3.20

自引率

7.10%

发文量

105

审稿时长

3 months

期刊介绍： TODAES is a premier ACM journal in design and automation of electronic systems. It publishes innovative work documenting significant research and development advances on the specification, design, analysis, simulation, testing, and evaluation of electronic systems, emphasizing a computer science/engineering orientation. Both theoretical analysis and practical solutions are welcome.