Sectum: Accurate Latency Prediction for TEE-hosted Deep Learning Inference

Yan Li, Junming Ma, Donggang Cao, Hong Mei
{"title":"Sectum: Accurate Latency Prediction for TEE-hosted Deep Learning Inference","authors":"Yan Li, Junming Ma, Donggang Cao, Hong Mei","doi":"10.1109/ICDCS54860.2022.00092","DOIUrl":null,"url":null,"abstract":"As the security issue of cloud-offloaded Deep Learning (DL) inference is drawing increasing attention, running DL inference in Trusted Execution Environments (TEEs) has become a common practice. Latency prediction of TEE-hosted DL model inference is essential for many scenarios, such as DNN model architecture searching with a latency constraint or layer scheduling in model-parallelism inference. However, existing solutions fail to address the memory over-commitment issue in resource-constrained environments inside TEEs.This paper presents Sectum, an accurate latency predictor for DL inference inside TEE enclaves. We first perform a synthetic empirical study to analyze the relationship between inference latency and memory occupation. Sectum predicts inference latency following a two-stage design based on some critical observations. First, Sectum uses a Graph Neural Network (GNN)-based model to detect whether a given model would trigger memory over-commitment in TEEs. Then, combining operator-level latency modeling with linear regression, Sectum could predict the latency of a model. To evaluate Sectum, we design a large dataset that contains the latency information of over 6k CNN models. Our experiments demonstrate that Sectum could achieve over 85% ±10% accuracy of latency prediction. To our knowledge, Sectum is the first method to predict TEE-hosted DL inference latency accurately.","PeriodicalId":225883,"journal":{"name":"2022 IEEE 42nd International Conference on Distributed Computing Systems (ICDCS)","volume":"114 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 42nd International Conference on Distributed Computing Systems (ICDCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDCS54860.2022.00092","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

As the security issue of cloud-offloaded Deep Learning (DL) inference is drawing increasing attention, running DL inference in Trusted Execution Environments (TEEs) has become a common practice. Latency prediction of TEE-hosted DL model inference is essential for many scenarios, such as DNN model architecture searching with a latency constraint or layer scheduling in model-parallelism inference. However, existing solutions fail to address the memory over-commitment issue in resource-constrained environments inside TEEs.This paper presents Sectum, an accurate latency predictor for DL inference inside TEE enclaves. We first perform a synthetic empirical study to analyze the relationship between inference latency and memory occupation. Sectum predicts inference latency following a two-stage design based on some critical observations. First, Sectum uses a Graph Neural Network (GNN)-based model to detect whether a given model would trigger memory over-commitment in TEEs. Then, combining operator-level latency modeling with linear regression, Sectum could predict the latency of a model. To evaluate Sectum, we design a large dataset that contains the latency information of over 6k CNN models. Our experiments demonstrate that Sectum could achieve over 85% ±10% accuracy of latency prediction. To our knowledge, Sectum is the first method to predict TEE-hosted DL inference latency accurately.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
分组:tee承载深度学习推理的准确延迟预测
随着云卸载深度学习(DL)推理的安全问题越来越受到关注,在可信执行环境(tee)中运行深度学习推理已经成为一种普遍的做法。基于tee的深度学习模型推理的延迟预测在许多情况下都是必不可少的,例如在模型并行推理中具有延迟约束的深度神经网络模型架构搜索或层调度。然而,现有的解决方案无法解决tee内部资源受限环境中的内存过度承诺问题。本文介绍了Sectum,一个准确的延迟预测器,用于TEE飞地内的DL推断。我们首先进行综合实证研究,分析推理延迟与内存占用之间的关系。Sectum根据一些关键观察结果预测了两阶段设计后的推理延迟。首先,Sectum使用基于图神经网络(GNN)的模型来检测给定模型是否会触发tee中的内存过度使用。然后,将算子级延迟建模与线性回归相结合,Sectum可以预测模型的延迟。为了评估Sectum,我们设计了一个包含超过6k个CNN模型延迟信息的大型数据集。我们的实验表明,Sectum可以达到85%±10%以上的延迟预测准确率。据我们所知,Sectum是第一个准确预测tee承载的DL推理延迟的方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Nezha: Exploiting Concurrency for Transaction Processing in DAG-based Blockchains Toward Cleansing Backdoored Neural Networks in Federated Learning Themis: An Equal, Unpredictable, and Scalable Consensus for Consortium Blockchain IoDSCF: A Store-Carry-Forward Routing Protocol for joint Bus Networks and Internet of Drones FlowValve: Packet Scheduling Offloaded on NP-based SmartNICs
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1