M. Chao, R. Stoleru, Liuyi Jin, Shuochao Yao, Maxwell Maurice, R. Blalock
{"title":"AMVP: Adaptive CNN-based Multitask Video Processing on Mobile Stream Processing Platforms","authors":"M. Chao, R. Stoleru, Liuyi Jin, Shuochao Yao, Maxwell Maurice, R. Blalock","doi":"10.1109/SEC50012.2020.00015","DOIUrl":null,"url":null,"abstract":"The popularity of video cameras has spawned a new type of application called multitask video processing, which uses multiple CNNs to obtain different information of interests from a raw video stream. Unfortunately, the huge resource requirements of CNNs make the concurrent execution of multiple CNNs on a single resource-constrained mobile device challenging. Existing solutions solve this challenge by offloading CNN models to the cloud or edge server, compressing CNN models to fit the mobile device, or sharing some common parts of multiple CNN models. Most of these solutions, however, use the above offloading, compression or sharing strategies in a separate manner, which fail to adapt to the complex edge computing scenario well. In this paper, to solve the above limitation, we propose AMVP, an adaptive execution framework for CNN-based multitask video processing, which elegantly integrates the strategies of CNN layer sharing, feature compression, and model offloading. First, AMVP reduces the total computation workload of multiple CNN inference by sharing some common frozen CNN layers. Second, AMVP supports distributed CNN inference by splitting big CNNs into smaller components running on different devices. Third, AMVP leverages a quantization-based feature compression mechanism to reduce the feature transmission traffic size between two separate CNN components. We conduct extensive experiments on AMVP and the experimental results show that our AMVP framework can adapt to different performance goals and execution environments. Compared to two baseline approaches that only share or offload CNN layers, AMVP achieves up to 61% lower latency and 10% higher throughput with comparative accuracy.","PeriodicalId":375577,"journal":{"name":"2020 IEEE/ACM Symposium on Edge Computing (SEC)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE/ACM Symposium on Edge Computing (SEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SEC50012.2020.00015","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8
Abstract
The popularity of video cameras has spawned a new type of application called multitask video processing, which uses multiple CNNs to obtain different information of interests from a raw video stream. Unfortunately, the huge resource requirements of CNNs make the concurrent execution of multiple CNNs on a single resource-constrained mobile device challenging. Existing solutions solve this challenge by offloading CNN models to the cloud or edge server, compressing CNN models to fit the mobile device, or sharing some common parts of multiple CNN models. Most of these solutions, however, use the above offloading, compression or sharing strategies in a separate manner, which fail to adapt to the complex edge computing scenario well. In this paper, to solve the above limitation, we propose AMVP, an adaptive execution framework for CNN-based multitask video processing, which elegantly integrates the strategies of CNN layer sharing, feature compression, and model offloading. First, AMVP reduces the total computation workload of multiple CNN inference by sharing some common frozen CNN layers. Second, AMVP supports distributed CNN inference by splitting big CNNs into smaller components running on different devices. Third, AMVP leverages a quantization-based feature compression mechanism to reduce the feature transmission traffic size between two separate CNN components. We conduct extensive experiments on AMVP and the experimental results show that our AMVP framework can adapt to different performance goals and execution environments. Compared to two baseline approaches that only share or offload CNN layers, AMVP achieves up to 61% lower latency and 10% higher throughput with comparative accuracy.