AxLaM: energy-efficient accelerator design for language models for edge computing.

IF 4.3 3区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences Pub Date : 2025-01-01 Epub Date: 2025-01-16 DOI:10.1098/rsta.2023.0395
Tom Glint, Bhumika Mittal, Santripta Sharma, Abdul Qadir Ronak, Abhinav Goud, Neerja Kasture, Zaqi Momin, Aravind Krishna, Joycee Mekie
{"title":"AxLaM: energy-efficient accelerator design for language models for edge computing.","authors":"Tom Glint, Bhumika Mittal, Santripta Sharma, Abdul Qadir Ronak, Abhinav Goud, Neerja Kasture, Zaqi Momin, Aravind Krishna, Joycee Mekie","doi":"10.1098/rsta.2023.0395","DOIUrl":null,"url":null,"abstract":"<p><p>Modern language models such as bidirectional encoder representations from transformers have revolutionized natural language processing (NLP) tasks but are computationally intensive, limiting their deployment on edge devices. This paper presents an energy-efficient accelerator design tailored for encoder-based language models, enabling their integration into mobile and edge computing environments. A data-flow-aware hardware accelerator design for language models inspired by Simba, makes use of approximate fixed-point POSIT-based multipliers and uses high bandwidth memory (HBM) in achieving significant improvements in computational efficiency, power consumption, area and latency compared to the hardware-realized scalable accelerator Simba. Compared to Simba, AxLaM achieves a ninefold energy reduction, 58% area reduction and 1.2 times improved latency, making it suitable for deployment in edge devices. The energy efficiency of AxLaN is 1.8 TOPS/W, 65% higher than FACT, which requires pre-processing of the language model before implementing it on the hardware.This article is part of the theme issue 'Emerging technologies for future secure computing platforms'.</p>","PeriodicalId":19879,"journal":{"name":"Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences","volume":"383 2288","pages":"20230395"},"PeriodicalIF":4.3000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1098/rsta.2023.0395","RegionNum":3,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/16 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0

Abstract

Modern language models such as bidirectional encoder representations from transformers have revolutionized natural language processing (NLP) tasks but are computationally intensive, limiting their deployment on edge devices. This paper presents an energy-efficient accelerator design tailored for encoder-based language models, enabling their integration into mobile and edge computing environments. A data-flow-aware hardware accelerator design for language models inspired by Simba, makes use of approximate fixed-point POSIT-based multipliers and uses high bandwidth memory (HBM) in achieving significant improvements in computational efficiency, power consumption, area and latency compared to the hardware-realized scalable accelerator Simba. Compared to Simba, AxLaM achieves a ninefold energy reduction, 58% area reduction and 1.2 times improved latency, making it suitable for deployment in edge devices. The energy efficiency of AxLaN is 1.8 TOPS/W, 65% higher than FACT, which requires pre-processing of the language model before implementing it on the hardware.This article is part of the theme issue 'Emerging technologies for future secure computing platforms'.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
AxLaM:边缘计算语言模型的节能加速器设计。
现代语言模型,如来自变压器的双向编码器表示,已经彻底改变了自然语言处理(NLP)任务,但计算密集型,限制了它们在边缘设备上的部署。本文提出了一种针对基于编码器的语言模型定制的节能加速器设计,使其能够集成到移动和边缘计算环境中。受Simba启发,为语言模型设计了一种数据流感知硬件加速器,它使用近似定点的基于posit的乘法器,并使用高带宽内存(HBM),与硬件实现的可扩展加速器Simba相比,在计算效率、功耗、面积和延迟方面取得了显著改善。与Simba相比,AxLaM的能耗降低了9倍,面积减少了58%,延迟提高了1.2倍,适合部署在边缘设备中。AxLaN的能效为1.8 TOPS/W,比FACT高65%,但在硬件上实现之前需要对语言模型进行预处理。本文是“未来安全计算平台的新兴技术”主题的一部分。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
9.30
自引率
2.00%
发文量
367
审稿时长
3 months
期刊介绍: Continuing its long history of influential scientific publishing, Philosophical Transactions A publishes high-quality theme issues on topics of current importance and general interest within the physical, mathematical and engineering sciences, guest-edited by leading authorities and comprising new research, reviews and opinions from prominent researchers.
期刊最新文献
A comprehensive study of quantum arithmetic circuits. Automated polynomial formal verification using generalized binary decision diagram patterns. AxLaM: energy-efficient accelerator design for language models for edge computing. Editorial: new Editor-in-Chief and the 360th anniversary of Philosophical Transactions. Exploiting the lock: leveraging MiG-V's logic locking for secret-data extraction.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1