MLBlocks: FPGA Blocks for Machine Learning Applications

Seyedramin Rasoulinezhad, D. Boland, P. Leong
{"title":"MLBlocks: FPGA Blocks for Machine Learning Applications","authors":"Seyedramin Rasoulinezhad, D. Boland, P. Leong","doi":"10.1145/3431920.3439479","DOIUrl":null,"url":null,"abstract":"The underlying goal of FPGA architecture research is to devise flexible substrates which implement a wide variety of circuits efficiently. Contemporary FPGA architectures have been optimized to support networking, signal processing and image processing applications through high precision digital signal processing (DSP) blocks. The recent emergence of machine learning has created a new set of demands characterized by: 1) higher computational density and 2) low precision arithmetic requirements. With the goal of exploring this new design space in a methodical manner, we first propose a problem formulation involving computing nested loops over multiply-accumulate (MAC) operations, which covers many basic linear algebra primitives and standard deep neural network (DNN) layers. A quantitative methodology for deriving efficient coarse-grained compute block architectures from benchmarks is then proposed together with a family of new compute units, called MLBlocks. These blocks are flexible mesh-based systolic array units parameterized with different data movements, data reuse, and multi-precision support. They utilize a columnar arrangement which is compatible with existing FPGA architectures. Finally, using synthetic benchmarks, we demonstrate that MLBlocks offer significantly improved performance over the commercial Xilinx DSP48E2, while maintaining similar area and timing requirements to current DSPs.","PeriodicalId":386071,"journal":{"name":"The 2021 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The 2021 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3431920.3439479","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

The underlying goal of FPGA architecture research is to devise flexible substrates which implement a wide variety of circuits efficiently. Contemporary FPGA architectures have been optimized to support networking, signal processing and image processing applications through high precision digital signal processing (DSP) blocks. The recent emergence of machine learning has created a new set of demands characterized by: 1) higher computational density and 2) low precision arithmetic requirements. With the goal of exploring this new design space in a methodical manner, we first propose a problem formulation involving computing nested loops over multiply-accumulate (MAC) operations, which covers many basic linear algebra primitives and standard deep neural network (DNN) layers. A quantitative methodology for deriving efficient coarse-grained compute block architectures from benchmarks is then proposed together with a family of new compute units, called MLBlocks. These blocks are flexible mesh-based systolic array units parameterized with different data movements, data reuse, and multi-precision support. They utilize a columnar arrangement which is compatible with existing FPGA architectures. Finally, using synthetic benchmarks, we demonstrate that MLBlocks offer significantly improved performance over the commercial Xilinx DSP48E2, while maintaining similar area and timing requirements to current DSPs.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
MLBlocks:机器学习应用的FPGA模块
FPGA架构研究的根本目标是设计灵活的衬底,以有效地实现各种电路。当代FPGA架构已经过优化,通过高精度数字信号处理(DSP)块支持网络、信号处理和图像处理应用。最近出现的机器学习产生了一系列新的需求,其特点是:1)更高的计算密度和2)低精度的算术要求。为了以系统的方式探索这一新的设计空间,我们首先提出了一个涉及计算多重累积(MAC)操作上的嵌套循环的问题公式,它涵盖了许多基本的线性代数原语和标准深度神经网络(DNN)层。然后提出了一种定量方法,用于从基准测试中获得高效的粗粒度计算块体系结构,以及一系列称为mlblock的新计算单元。这些块是灵活的基于网格的收缩阵列单元,参数化了不同的数据移动、数据重用和多精度支持。它们采用与现有FPGA架构兼容的柱状排列。最后,使用合成基准测试,我们证明mlblock比商用Xilinx DSP48E2提供了显着改进的性能,同时保持了与当前dsp相似的面积和时序要求。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Exploring PGAS Communication for Heterogeneous Clusters with FPGAs NASCENT: Near-Storage Acceleration of Database Sort on SmartSSD Global Is the New Local: FPGA Architecture at 5nm and Beyond Triggered Scheduling: Efficient Detection of Dataflow Network Idleness on Heterogeneous Systems Reconfigurable Acceleration of Short Read Mapping with Biological Consideration
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1