探索 LSTM 加速器的能效:嵌入式 FPGA 的参数化架构设计

IF 3.7 2区 计算机科学 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Journal of Systems Architecture Pub Date : 2024-05-18 DOI:10.1016/j.sysarc.2024.103181
Chao Qian, Tianheng Ling, Gregor Schiele
{"title":"探索 LSTM 加速器的能效:嵌入式 FPGA 的参数化架构设计","authors":"Chao Qian,&nbsp;Tianheng Ling,&nbsp;Gregor Schiele","doi":"10.1016/j.sysarc.2024.103181","DOIUrl":null,"url":null,"abstract":"<div><p>Long Short-Term Memory Networks (LSTMs) are pivotal in on-device time series analysis for embedded systems, particularly for managing sensor data streams. Yet, their deployment on resource-constrained embedded devices presents notable challenges. In response, we introduce a novel parameterized architecture for LSTM accelerators designed explicitly for embedded Field-Programmable Gate Arrays (FPGAs). Our approach involves strategic design choices, such as employing computationally efficient activation functions and optimizing clock frequency with a pipelined Arithmetic Logic Unit (ALU). These decisions drive our architecture towards enhanced energy efficiency while maintaining adaptability across diverse application scenarios. A key feature of our architecture is its configurable parameters, which allow for tailored optimization through the optional use of Digital Signal Processor Slices for ALUs and the selective implementation of activation functions. Our empirical evaluations conducted on the <em>Spartan-7 XC7S15</em> FPGA demonstrate the robustness of our methodology, achieving a 2.33<span><math><mo>×</mo></math></span> improvement in energy efficiency over previous solutions. Furthermore, our study examines the correlation between memory resource types and energy efficiency across various LSTM model sizes. Impressively, even with a 9<span><math><mo>×</mo></math></span> increase in the hidden size of the LSTM cell, our accelerator maintains an energy efficiency of 10.03 GOP/s/W, with only a minor decrease of 14.65%. However, it is critical to note that our current design is not yet optimized for larger FPGA models such as the <em>Spartan-7 XC7S25</em> and <em>XC7S50</em>. For these models, timing constraints, rather than resource limitations, pose challenges to scaling, highlighting a potential area for future optimization.</p></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"152 ","pages":"Article 103181"},"PeriodicalIF":3.7000,"publicationDate":"2024-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1383762124001188/pdfft?md5=7824b2a17822bc51bc5d88b475a6970e&pid=1-s2.0-S1383762124001188-main.pdf","citationCount":"0","resultStr":"{\"title\":\"Exploring energy efficiency of LSTM accelerators: A parameterized architecture design for embedded FPGAs\",\"authors\":\"Chao Qian,&nbsp;Tianheng Ling,&nbsp;Gregor Schiele\",\"doi\":\"10.1016/j.sysarc.2024.103181\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Long Short-Term Memory Networks (LSTMs) are pivotal in on-device time series analysis for embedded systems, particularly for managing sensor data streams. Yet, their deployment on resource-constrained embedded devices presents notable challenges. In response, we introduce a novel parameterized architecture for LSTM accelerators designed explicitly for embedded Field-Programmable Gate Arrays (FPGAs). Our approach involves strategic design choices, such as employing computationally efficient activation functions and optimizing clock frequency with a pipelined Arithmetic Logic Unit (ALU). These decisions drive our architecture towards enhanced energy efficiency while maintaining adaptability across diverse application scenarios. A key feature of our architecture is its configurable parameters, which allow for tailored optimization through the optional use of Digital Signal Processor Slices for ALUs and the selective implementation of activation functions. Our empirical evaluations conducted on the <em>Spartan-7 XC7S15</em> FPGA demonstrate the robustness of our methodology, achieving a 2.33<span><math><mo>×</mo></math></span> improvement in energy efficiency over previous solutions. Furthermore, our study examines the correlation between memory resource types and energy efficiency across various LSTM model sizes. Impressively, even with a 9<span><math><mo>×</mo></math></span> increase in the hidden size of the LSTM cell, our accelerator maintains an energy efficiency of 10.03 GOP/s/W, with only a minor decrease of 14.65%. However, it is critical to note that our current design is not yet optimized for larger FPGA models such as the <em>Spartan-7 XC7S25</em> and <em>XC7S50</em>. For these models, timing constraints, rather than resource limitations, pose challenges to scaling, highlighting a potential area for future optimization.</p></div>\",\"PeriodicalId\":50027,\"journal\":{\"name\":\"Journal of Systems Architecture\",\"volume\":\"152 \",\"pages\":\"Article 103181\"},\"PeriodicalIF\":3.7000,\"publicationDate\":\"2024-05-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S1383762124001188/pdfft?md5=7824b2a17822bc51bc5d88b475a6970e&pid=1-s2.0-S1383762124001188-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Systems Architecture\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1383762124001188\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Systems Architecture","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1383762124001188","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0

摘要

长短期记忆网络(LSTM)在嵌入式系统的设备上时间序列分析中,尤其是在管理传感器数据流方面发挥着关键作用。然而,在资源有限的嵌入式设备上部署 LSTM 却面临着显著的挑战。为此,我们推出了一种新颖的 LSTM 加速器参数化架构,该架构专为嵌入式现场可编程门阵列 (FPGA) 而设计。我们的方法涉及战略性的设计选择,例如采用计算效率高的激活函数,并通过流水线算术逻辑单元 (ALU) 优化时钟频率。这些决策推动我们的架构在提高能效的同时,保持了对各种应用场景的适应性。我们架构的一个主要特点是参数可配置,可通过为 ALU 可选使用数字信号处理器切片和有选择地执行激活函数,实现量身定制的优化。我们在 Spartan-7 XC7S15 FPGA 上进行的实证评估证明了我们方法的稳健性,与以前的解决方案相比,我们的能效提高了 2.33 倍。此外,我们的研究还考察了各种 LSTM 模型大小的内存资源类型与能效之间的相关性。令人印象深刻的是,即使 LSTM 单元的隐藏大小增加了 9 倍,我们的加速器仍能保持 10.03 GOP/s/W 的能效,仅略微降低了 14.65%。不过,必须指出的是,我们目前的设计尚未针对 Spartan-7 XC7S25 和 XC7S50 等更大的 FPGA 型号进行优化。对于这些型号,时序约束而不是资源限制对扩展构成了挑战,这也是未来优化的一个潜在领域。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Exploring energy efficiency of LSTM accelerators: A parameterized architecture design for embedded FPGAs

Long Short-Term Memory Networks (LSTMs) are pivotal in on-device time series analysis for embedded systems, particularly for managing sensor data streams. Yet, their deployment on resource-constrained embedded devices presents notable challenges. In response, we introduce a novel parameterized architecture for LSTM accelerators designed explicitly for embedded Field-Programmable Gate Arrays (FPGAs). Our approach involves strategic design choices, such as employing computationally efficient activation functions and optimizing clock frequency with a pipelined Arithmetic Logic Unit (ALU). These decisions drive our architecture towards enhanced energy efficiency while maintaining adaptability across diverse application scenarios. A key feature of our architecture is its configurable parameters, which allow for tailored optimization through the optional use of Digital Signal Processor Slices for ALUs and the selective implementation of activation functions. Our empirical evaluations conducted on the Spartan-7 XC7S15 FPGA demonstrate the robustness of our methodology, achieving a 2.33× improvement in energy efficiency over previous solutions. Furthermore, our study examines the correlation between memory resource types and energy efficiency across various LSTM model sizes. Impressively, even with a 9× increase in the hidden size of the LSTM cell, our accelerator maintains an energy efficiency of 10.03 GOP/s/W, with only a minor decrease of 14.65%. However, it is critical to note that our current design is not yet optimized for larger FPGA models such as the Spartan-7 XC7S25 and XC7S50. For these models, timing constraints, rather than resource limitations, pose challenges to scaling, highlighting a potential area for future optimization.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Journal of Systems Architecture
Journal of Systems Architecture 工程技术-计算机:硬件
CiteScore
8.70
自引率
15.60%
发文量
226
审稿时长
46 days
期刊介绍: The Journal of Systems Architecture: Embedded Software Design (JSA) is a journal covering all design and architectural aspects related to embedded systems and software. It ranges from the microarchitecture level via the system software level up to the application-specific architecture level. Aspects such as real-time systems, operating systems, FPGA programming, programming languages, communications (limited to analysis and the software stack), mobile systems, parallel and distributed architectures as well as additional subjects in the computer and system architecture area will fall within the scope of this journal. Technology will not be a main focus, but its use and relevance to particular designs will be. Case studies are welcome but must contribute more than just a design for a particular piece of software. Design automation of such systems including methodologies, techniques and tools for their design as well as novel designs of software components fall within the scope of this journal. Novel applications that use embedded systems are also central in this journal. While hardware is not a part of this journal hardware/software co-design methods that consider interplay between software and hardware components with and emphasis on software are also relevant here.
期刊最新文献
SAMFL: Secure Aggregation Mechanism for Federated Learning with Byzantine-robustness by functional encryption ZNS-Cleaner: Enhancing lifespan by reducing empty erase in ZNS SSDs Using MAST for modeling and response-time analysis of real-time applications with GPUs Shift-and-Safe: Addressing permanent faults in aggressively undervolted CNN accelerators Function Placement Approaches in Serverless Computing: A Survey
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1