Sectum: Accurate Latency Prediction for TEE-hosted Deep Learning Inference

2022 IEEE 42nd International Conference on Distributed Computing Systems (ICDCS) Pub Date : 2022-07-01 DOI:10.1109/ICDCS54860.2022.00092

Yan Li, Junming Ma, Donggang Cao, Hong Mei

{"title":"Sectum: Accurate Latency Prediction for TEE-hosted Deep Learning Inference","authors":"Yan Li, Junming Ma, Donggang Cao, Hong Mei","doi":"10.1109/ICDCS54860.2022.00092","DOIUrl":null,"url":null,"abstract":"As the security issue of cloud-offloaded Deep Learning (DL) inference is drawing increasing attention, running DL inference in Trusted Execution Environments (TEEs) has become a common practice. Latency prediction of TEE-hosted DL model inference is essential for many scenarios, such as DNN model architecture searching with a latency constraint or layer scheduling in model-parallelism inference. However, existing solutions fail to address the memory over-commitment issue in resource-constrained environments inside TEEs.This paper presents Sectum, an accurate latency predictor for DL inference inside TEE enclaves. We first perform a synthetic empirical study to analyze the relationship between inference latency and memory occupation. Sectum predicts inference latency following a two-stage design based on some critical observations. First, Sectum uses a Graph Neural Network (GNN)-based model to detect whether a given model would trigger memory over-commitment in TEEs. Then, combining operator-level latency modeling with linear regression, Sectum could predict the latency of a model. To evaluate Sectum, we design a large dataset that contains the latency information of over 6k CNN models. Our experiments demonstrate that Sectum could achieve over 85% ±10% accuracy of latency prediction. To our knowledge, Sectum is the first method to predict TEE-hosted DL inference latency accurately.","PeriodicalId":225883,"journal":{"name":"2022 IEEE 42nd International Conference on Distributed Computing Systems (ICDCS)","volume":"114 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 42nd International Conference on Distributed Computing Systems (ICDCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDCS54860.2022.00092","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

As the security issue of cloud-offloaded Deep Learning (DL) inference is drawing increasing attention, running DL inference in Trusted Execution Environments (TEEs) has become a common practice. Latency prediction of TEE-hosted DL model inference is essential for many scenarios, such as DNN model architecture searching with a latency constraint or layer scheduling in model-parallelism inference. However, existing solutions fail to address the memory over-commitment issue in resource-constrained environments inside TEEs.This paper presents Sectum, an accurate latency predictor for DL inference inside TEE enclaves. We first perform a synthetic empirical study to analyze the relationship between inference latency and memory occupation. Sectum predicts inference latency following a two-stage design based on some critical observations. First, Sectum uses a Graph Neural Network (GNN)-based model to detect whether a given model would trigger memory over-commitment in TEEs. Then, combining operator-level latency modeling with linear regression, Sectum could predict the latency of a model. To evaluate Sectum, we design a large dataset that contains the latency information of over 6k CNN models. Our experiments demonstrate that Sectum could achieve over 85% ±10% accuracy of latency prediction. To our knowledge, Sectum is the first method to predict TEE-hosted DL inference latency accurately.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

分组:tee承载深度学习推理的准确延迟预测

随着云卸载深度学习(DL)推理的安全问题越来越受到关注，在可信执行环境(tee)中运行深度学习推理已经成为一种普遍的做法。基于tee的深度学习模型推理的延迟预测在许多情况下都是必不可少的，例如在模型并行推理中具有延迟约束的深度神经网络模型架构搜索或层调度。然而，现有的解决方案无法解决tee内部资源受限环境中的内存过度承诺问题。本文介绍了Sectum，一个准确的延迟预测器，用于TEE飞地内的DL推断。我们首先进行综合实证研究，分析推理延迟与内存占用之间的关系。Sectum根据一些关键观察结果预测了两阶段设计后的推理延迟。首先，Sectum使用基于图神经网络(GNN)的模型来检测给定模型是否会触发tee中的内存过度使用。然后，将算子级延迟建模与线性回归相结合，Sectum可以预测模型的延迟。为了评估Sectum，我们设计了一个包含超过6k个CNN模型延迟信息的大型数据集。我们的实验表明，Sectum可以达到85%±10%以上的延迟预测准确率。据我们所知，Sectum是第一个准确预测tee承载的DL推理延迟的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2022 IEEE 42nd International Conference on Distributed Computing Systems (ICDCS)

自引率

0.00%

发文量