A Case for Generalizable DNN Cost Models for Mobile Devices

Vinod Ganesan, Surya Selvam, Sanchari Sen, Pratyush Kumar, A. Raghunathan
{"title":"A Case for Generalizable DNN Cost Models for Mobile Devices","authors":"Vinod Ganesan, Surya Selvam, Sanchari Sen, Pratyush Kumar, A. Raghunathan","doi":"10.1109/IISWC50251.2020.00025","DOIUrl":null,"url":null,"abstract":"Accurate workload characterization of Deep Neural Networks (DNNs) is challenged by both network and hardware diversity. Networks are being designed with newer motifs such as depthwise separable convolutions, bottleneck layers, etc., which have widely varying performance characteristics. Further, the adoption of Neural Architecture Search (NAS) is creating a Cambrian explosion of networks, greatly expanding the space of networks that must be modeled. On the hardware front, myriad accelerators are being built for DNNs, while compiler improvements are enabling more efficient execution of DNNs on a wide range of CPUs and GPUs. Clearly, characterizing each DNN on each hardware system is infeasible. We thus need cost models to estimate performance that generalize across both devices and networks. In this work, we address this challenge by building a cost model of DNNs on mobile devices. The modeling and evaluation are based on latency measurements of 118 networks on 105 mobile System-on-Chips (SoCs). As a key contribution, we propose that a hardware platform can be represented by its measured latencies on a judiciously chosen, small set of networks, which we call the signature set. We also design a machine learning model that takes as inputs (i) the target hardware representation (measured latencies of the signature set on the hardware) and (ii) a representation of the structure of the DNN to be evaluated, and predicts the latency of the DNN on the target hardware. We propose and evaluate different algorithms to select the signature set. Our results show that by carefully choosing the signature set, the network representation, and the machine learning algorithm, we can train accurate cost models that generalize well. We demonstrate the value of such a cost model in a collaborative workload characterization setup, wherein every mobile device contributes a small set of latency measurements to a centralized repository. With even a small number of measurements per new device, we show that the proposed cost model matches the accuracy of device-specific models trained on an order-of-magnitude larger number of measurements. The entire codebase is released at https://github.com/iitm-sysdl/Generalizable-DNN-cost-models.","PeriodicalId":365983,"journal":{"name":"2020 IEEE International Symposium on Workload Characterization (IISWC)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE International Symposium on Workload Characterization (IISWC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IISWC50251.2020.00025","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

Accurate workload characterization of Deep Neural Networks (DNNs) is challenged by both network and hardware diversity. Networks are being designed with newer motifs such as depthwise separable convolutions, bottleneck layers, etc., which have widely varying performance characteristics. Further, the adoption of Neural Architecture Search (NAS) is creating a Cambrian explosion of networks, greatly expanding the space of networks that must be modeled. On the hardware front, myriad accelerators are being built for DNNs, while compiler improvements are enabling more efficient execution of DNNs on a wide range of CPUs and GPUs. Clearly, characterizing each DNN on each hardware system is infeasible. We thus need cost models to estimate performance that generalize across both devices and networks. In this work, we address this challenge by building a cost model of DNNs on mobile devices. The modeling and evaluation are based on latency measurements of 118 networks on 105 mobile System-on-Chips (SoCs). As a key contribution, we propose that a hardware platform can be represented by its measured latencies on a judiciously chosen, small set of networks, which we call the signature set. We also design a machine learning model that takes as inputs (i) the target hardware representation (measured latencies of the signature set on the hardware) and (ii) a representation of the structure of the DNN to be evaluated, and predicts the latency of the DNN on the target hardware. We propose and evaluate different algorithms to select the signature set. Our results show that by carefully choosing the signature set, the network representation, and the machine learning algorithm, we can train accurate cost models that generalize well. We demonstrate the value of such a cost model in a collaborative workload characterization setup, wherein every mobile device contributes a small set of latency measurements to a centralized repository. With even a small number of measurements per new device, we show that the proposed cost model matches the accuracy of device-specific models trained on an order-of-magnitude larger number of measurements. The entire codebase is released at https://github.com/iitm-sysdl/Generalizable-DNN-cost-models.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
移动设备的广义DNN成本模型
深度神经网络(dnn)工作负载的准确表征受到网络和硬件多样性的双重挑战。网络正在设计新的主题,如深度可分离卷积,瓶颈层等,它们具有广泛不同的性能特征。此外,神经架构搜索(NAS)的采用正在创造网络的寒武纪大爆发,极大地扩展了必须建模的网络空间。在硬件方面,无数的加速器正在为深度神经网络构建,而编译器的改进使深度神经网络能够在各种cpu和gpu上更有效地执行。显然,描述每个硬件系统上的每个DNN是不可行的。因此,我们需要成本模型来估计适用于设备和网络的性能。在这项工作中,我们通过在移动设备上构建dnn的成本模型来解决这一挑战。建模和评估是基于105个移动系统芯片(soc)上118个网络的延迟测量。作为一个关键贡献,我们建议硬件平台可以通过其在明智选择的一小组网络(我们称之为签名集)上的测量延迟来表示。我们还设计了一个机器学习模型,该模型将(i)目标硬件表示(硬件上签名集的测量延迟)和(ii)待评估DNN结构的表示作为输入,并预测DNN在目标硬件上的延迟。我们提出并评估了选择签名集的不同算法。我们的研究结果表明,通过仔细选择签名集、网络表示和机器学习算法,我们可以训练出准确的泛化成本模型。我们在协作工作负载表征设置中演示了这种成本模型的价值,其中每个移动设备都为集中存储库提供了一小组延迟测量。即使对每个新设备进行少量测量,我们也表明,所提出的成本模型与在大量测量上训练的特定于设备的模型的准确性相匹配。整个代码库在https://github.com/iitm-sysdl/Generalizable-DNN-cost-models上发布。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Organizing Committee : IISWC 2020 Characterizing the impact of last-level cache replacement policies on big-data workloads AI on the Edge: Characterizing AI-based IoT Applications Using Specialized Edge Architectures Empirical Analysis and Modeling of Compute Times of CNN Operations on AWS Cloud Reliability Modeling of NISQ- Era Quantum Computers
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1