AxLaM: energy-efficient accelerator design for language models for edge computing.

IF 4.3 3区综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences Pub Date : 2025-01-01 Epub Date: 2025-01-16 DOI:10.1098/rsta.2023.0395

Tom Glint, Bhumika Mittal, Santripta Sharma, Abdul Qadir Ronak, Abhinav Goud, Neerja Kasture, Zaqi Momin, Aravind Krishna, Joycee Mekie

{"title":"AxLaM: energy-efficient accelerator design for language models for edge computing.","authors":"Tom Glint, Bhumika Mittal, Santripta Sharma, Abdul Qadir Ronak, Abhinav Goud, Neerja Kasture, Zaqi Momin, Aravind Krishna, Joycee Mekie","doi":"10.1098/rsta.2023.0395","DOIUrl":null,"url":null,"abstract":"<p><p>Modern language models such as bidirectional encoder representations from transformers have revolutionized natural language processing (NLP) tasks but are computationally intensive, limiting their deployment on edge devices. This paper presents an energy-efficient accelerator design tailored for encoder-based language models, enabling their integration into mobile and edge computing environments. A data-flow-aware hardware accelerator design for language models inspired by Simba, makes use of approximate fixed-point POSIT-based multipliers and uses high bandwidth memory (HBM) in achieving significant improvements in computational efficiency, power consumption, area and latency compared to the hardware-realized scalable accelerator Simba. Compared to Simba, AxLaM achieves a ninefold energy reduction, 58% area reduction and 1.2 times improved latency, making it suitable for deployment in edge devices. The energy efficiency of AxLaN is 1.8 TOPS/W, 65% higher than FACT, which requires pre-processing of the language model before implementing it on the hardware.This article is part of the theme issue 'Emerging technologies for future secure computing platforms'.</p>","PeriodicalId":19879,"journal":{"name":"Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences","volume":"383 2288","pages":"20230395"},"PeriodicalIF":4.3000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1098/rsta.2023.0395","RegionNum":3,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/16 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}

引用次数: 0

Abstract

Modern language models such as bidirectional encoder representations from transformers have revolutionized natural language processing (NLP) tasks but are computationally intensive, limiting their deployment on edge devices. This paper presents an energy-efficient accelerator design tailored for encoder-based language models, enabling their integration into mobile and edge computing environments. A data-flow-aware hardware accelerator design for language models inspired by Simba, makes use of approximate fixed-point POSIT-based multipliers and uses high bandwidth memory (HBM) in achieving significant improvements in computational efficiency, power consumption, area and latency compared to the hardware-realized scalable accelerator Simba. Compared to Simba, AxLaM achieves a ninefold energy reduction, 58% area reduction and 1.2 times improved latency, making it suitable for deployment in edge devices. The energy efficiency of AxLaN is 1.8 TOPS/W, 65% higher than FACT, which requires pre-processing of the language model before implementing it on the hardware.This article is part of the theme issue 'Emerging technologies for future secure computing platforms'.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

AxLaM：边缘计算语言模型的节能加速器设计。

现代语言模型，如来自变压器的双向编码器表示，已经彻底改变了自然语言处理（NLP）任务，但计算密集型，限制了它们在边缘设备上的部署。本文提出了一种针对基于编码器的语言模型定制的节能加速器设计，使其能够集成到移动和边缘计算环境中。受Simba启发，为语言模型设计了一种数据流感知硬件加速器，它使用近似定点的基于posit的乘法器，并使用高带宽内存（HBM），与硬件实现的可扩展加速器Simba相比，在计算效率、功耗、面积和延迟方面取得了显著改善。与Simba相比，AxLaM的能耗降低了9倍，面积减少了58%，延迟提高了1.2倍，适合部署在边缘设备中。AxLaN的能效为1.8 TOPS/W，比FACT高65%，但在硬件上实现之前需要对语言模型进行预处理。本文是“未来安全计算平台的新兴技术”主题的一部分。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 综合性期刊-综合性期刊

CiteScore

9.30

自引率

2.00%

发文量

367

审稿时长

3 months

期刊介绍： Continuing its long history of influential scientific publishing, Philosophical Transactions A publishes high-quality theme issues on topics of current importance and general interest within the physical, mathematical and engineering sciences, guest-edited by leading authorities and comprising new research, reviews and opinions from prominent researchers.