RDP3: Rapid Domain Platform Performance Prediction for Design Space Exploration

Jinghan Zhang, Mehrshad Zandigohar, G. Schirner
{"title":"RDP3: Rapid Domain Platform Performance Prediction for Design Space Exploration","authors":"Jinghan Zhang, Mehrshad Zandigohar, G. Schirner","doi":"10.1109/ICCD53106.2021.00086","DOIUrl":null,"url":null,"abstract":"Heterogeneous Accelerator-rich (ACC-rich) platforms combining general-purpose cores and specialized HW Accelerators (ACCs) promise high-performance and low-power deployment of streaming applications, e.g. for video analytics, software-defined radio, and radar. In order to recover Non-Recurring Engineering (NRE) cost, a unified domain platform for a set of applications can be exploited, especially when applications have functional and structural similarities, which can benefit from common ACCs. However, identifying the most beneficial set of common ACCs is challenging, and current Design Space Exploration (DSE) methods for domain platform allocation suffer from a long exploration time bottleneck. In particular, compared to a traditional DSE, evaluating the performance of a platform for a domain of applications is much more time-consuming as binding exploration and evaluation for each application in the domain is required. Thus, a rapid domain performance evaluation is needed to speed up the exploration of the platform allocation.This paper introduces Rapid Domain Platform Performance Prediction (RDP3) methods to speed up the exploration in domain DSE. Key contributions are: (1) analyzing current domain DSE flow and its exploration time bottleneck; (2) introducing four RDP3 methods to speedup the evaluation of different platform allocations: Heuristic Processing (HP) estimation, Linear Regression (LR), Decision Tree Regression (DTR), and Multi-Layer Perceptron (MLP) predictions; (3) comparing the performance of these predictions and integrating the prediction into the current domain DSE. To evaluate the efficacy of RDP3, we explore 10K platforms capable of processing OpenVX domain applications. We demonstrate that RDP3-MLP as the most promising method can achieve a speedup of 17.5K times with only 0.001 mean square error compared to the current platform evaluation using the analytical model. Integrating RDP3-MLP into the existing domain DSE method GIDE [1] can save 80.8% exploration time while still resulting in the same output platform design.","PeriodicalId":154014,"journal":{"name":"2021 IEEE 39th International Conference on Computer Design (ICCD)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 39th International Conference on Computer Design (ICCD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCD53106.2021.00086","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Heterogeneous Accelerator-rich (ACC-rich) platforms combining general-purpose cores and specialized HW Accelerators (ACCs) promise high-performance and low-power deployment of streaming applications, e.g. for video analytics, software-defined radio, and radar. In order to recover Non-Recurring Engineering (NRE) cost, a unified domain platform for a set of applications can be exploited, especially when applications have functional and structural similarities, which can benefit from common ACCs. However, identifying the most beneficial set of common ACCs is challenging, and current Design Space Exploration (DSE) methods for domain platform allocation suffer from a long exploration time bottleneck. In particular, compared to a traditional DSE, evaluating the performance of a platform for a domain of applications is much more time-consuming as binding exploration and evaluation for each application in the domain is required. Thus, a rapid domain performance evaluation is needed to speed up the exploration of the platform allocation.This paper introduces Rapid Domain Platform Performance Prediction (RDP3) methods to speed up the exploration in domain DSE. Key contributions are: (1) analyzing current domain DSE flow and its exploration time bottleneck; (2) introducing four RDP3 methods to speedup the evaluation of different platform allocations: Heuristic Processing (HP) estimation, Linear Regression (LR), Decision Tree Regression (DTR), and Multi-Layer Perceptron (MLP) predictions; (3) comparing the performance of these predictions and integrating the prediction into the current domain DSE. To evaluate the efficacy of RDP3, we explore 10K platforms capable of processing OpenVX domain applications. We demonstrate that RDP3-MLP as the most promising method can achieve a speedup of 17.5K times with only 0.001 mean square error compared to the current platform evaluation using the analytical model. Integrating RDP3-MLP into the existing domain DSE method GIDE [1] can save 80.8% exploration time while still resulting in the same output platform design.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
面向设计空间探索的快速领域平台性能预测
结合通用核心和专用硬件加速器(ACCs)的异构富加速器(ACC-rich)平台承诺高性能和低功耗的流应用部署,例如视频分析,软件定义无线电和雷达。为了回收非重复工程(NRE)成本,可以利用一组应用程序的统一域平台,特别是当应用程序具有功能和结构相似性时,这可以从通用acc中受益。然而,确定最有益的通用acc集是具有挑战性的,并且当前用于领域平台分配的设计空间探索(DSE)方法存在较长的探索时间瓶颈。特别是,与传统的DSE相比,评估应用程序领域的平台性能要耗时得多,因为需要对领域中的每个应用程序进行绑定探索和评估。因此,需要快速的域性能评估来加快平台分配的探索。本文介绍了快速领域平台性能预测(RDP3)方法,以加快对领域DSE的探索。主要贡献有:(1)分析了当前域DSE流及其勘探时间瓶颈;(2)引入了启发式处理(HP)估计、线性回归(LR)、决策树回归(DTR)和多层感知器(MLP)预测四种RDP3方法来加速不同平台分配的评估;(3)比较这些预测的性能,并将预测结果整合到当前域DSE中。为了评估RDP3的有效性,我们探索了能够处理OpenVX域应用程序的10K平台。我们证明,与使用分析模型的当前平台评估相比,RDP3-MLP作为最有希望的方法可以实现17.5K倍的加速,均方误差仅为0.001。将RDP3-MLP集成到现有的域DSE方法GIDE[1]中,在输出平台设计不变的情况下,可以节省80.8%的勘探时间。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Smart-DNN: Efficiently Reducing the Memory Requirements of Running Deep Neural Networks on Resource-constrained Platforms CoRe-ECO: Concurrent Refinement of Detailed Place-and-Route for an Efficient ECO Automation Accurate and Fast Performance Modeling of Processors with Decoupled Front-end Block-LSM: An Ether-aware Block-ordered LSM-tree based Key-Value Storage Engine Dynamic File Cache Optimization for Hybrid SSDs with High-Density and Low-Cost Flash Memory
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1