Using Machine Learning to Estimate Utilization and Throughput for OpenCL-Based Matrix-Vector Multiplication (MVM)

Jannatun Naher, C. Gloster, C. Doss, Shrikant S. Jadhav
{"title":"Using Machine Learning to Estimate Utilization and Throughput for OpenCL-Based Matrix-Vector Multiplication (MVM)","authors":"Jannatun Naher, C. Gloster, C. Doss, Shrikant S. Jadhav","doi":"10.1109/CCWC47524.2020.9031173","DOIUrl":null,"url":null,"abstract":"OpenCL is a framework for writing programs that execute across heterogeneous platforms, including FPGAs. OpenCL allows users to write standardized C-like code for the host as well as for the hardware accelerators, thus reducing the programming challenge for FPGAs. Hardware descriptions can be written in OpenCL using different memory access and data partitioning strategies. Matrix-Vector Multiplication (MVM) is the critical computational bottleneck for many System of Linear Equations (SLEs) solvers. The MVM OpenCL kernel can be optimized by varying several design parameters in the OpenCL description, improving hardware performance. To effectively explore the design space, logic synthesis is performed after each iteration of setting design parameters to determine their impact on design area and performance. However, each of these synthesis runs can take multiple hours. Hence, manual design space exploration for a large number of designs is prohibitive. To address this challenge, a prediction of FPGA utilization and throughput can significantly reduce the design time. This paper presents a machine learning-based approach to estimating FPGA utilization and throughput for a given set of design parameter values. It also presents an optimized MVM implementation obtained after compiling, synthesizing, and executing over 100 designs. The Random Forest machine learning algorithm estimates the result and for 175 designs, the average error is. 0098%,. 0012%,. 0039%,. 0414%, and 123.21% for estimating Look-up Tables (LUTs), Digital Signal Processors (DSPs), memory bits, RAM blocks and throughput (GFLOPs) respectively.","PeriodicalId":161209,"journal":{"name":"2020 10th Annual Computing and Communication Workshop and Conference (CCWC)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 10th Annual Computing and Communication Workshop and Conference (CCWC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCWC47524.2020.9031173","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

OpenCL is a framework for writing programs that execute across heterogeneous platforms, including FPGAs. OpenCL allows users to write standardized C-like code for the host as well as for the hardware accelerators, thus reducing the programming challenge for FPGAs. Hardware descriptions can be written in OpenCL using different memory access and data partitioning strategies. Matrix-Vector Multiplication (MVM) is the critical computational bottleneck for many System of Linear Equations (SLEs) solvers. The MVM OpenCL kernel can be optimized by varying several design parameters in the OpenCL description, improving hardware performance. To effectively explore the design space, logic synthesis is performed after each iteration of setting design parameters to determine their impact on design area and performance. However, each of these synthesis runs can take multiple hours. Hence, manual design space exploration for a large number of designs is prohibitive. To address this challenge, a prediction of FPGA utilization and throughput can significantly reduce the design time. This paper presents a machine learning-based approach to estimating FPGA utilization and throughput for a given set of design parameter values. It also presents an optimized MVM implementation obtained after compiling, synthesizing, and executing over 100 designs. The Random Forest machine learning algorithm estimates the result and for 175 designs, the average error is. 0098%,. 0012%,. 0039%,. 0414%, and 123.21% for estimating Look-up Tables (LUTs), Digital Signal Processors (DSPs), memory bits, RAM blocks and throughput (GFLOPs) respectively.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用机器学习估计基于opencl的矩阵向量乘法(MVM)的利用率和吞吐量
OpenCL是用于编写跨异构平台(包括fpga)执行的程序的框架。OpenCL允许用户为主机和硬件加速器编写标准化的类c代码,从而减少fpga的编程挑战。硬件描述可以使用不同的内存访问和数据分区策略在OpenCL中编写。矩阵向量乘法(MVM)是许多线性方程组(SLEs)求解的关键计算瓶颈。MVM OpenCL内核可以通过改变OpenCL描述中的几个设计参数来优化,从而提高硬件性能。为了有效地探索设计空间,在每次迭代设置设计参数后进行逻辑综合,以确定其对设计区域和性能的影响。然而,这些合成的每一次运行都要花费数小时。因此,对于大量的设计,手工的设计空间探索是令人望而却步的。为了应对这一挑战,对FPGA利用率和吞吐量的预测可以显著缩短设计时间。本文提出了一种基于机器学习的方法来估计给定一组设计参数值的FPGA利用率和吞吐量。在编译、综合和执行了100多个设计后,给出了一个优化的MVM实现。随机森林机器学习算法估计结果,对于175个设计,平均误差为。0098%,。0012%,。0039%,。0414%和123.21%分别用于估计查找表(LUTs)、数字信号处理器(dsp)、内存位、RAM块和吞吐量(GFLOPs)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Environmental Perception in Autonomous Vehicles Using Edge Level Situational Awareness Secure5G: A Deep Learning Framework Towards a Secure Network Slicing in 5G and Beyond Focus Detection Using Spatial Release From Masking An Intrusion Detection System Against DDoS Attacks in IoT Networks The self- upgrading mobile application for the automatic malaria detection
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1