Young D. Kwon, Jagmohan Chauhan, Abhishek Kumar, Pan Hui, C. Mascolo
Continual learning approaches help deep neural network models adapt and learn incrementally by trying to solve catastrophic forgetting. However, whether these existing approaches, applied traditionally to image-based tasks, work with the same efficacy to the sequential time series data generated by mobile or embedded sensing systems remains an unanswered question. To address this void, we conduct the first comprehensive empirical study that quantifies the performance of three predominant continual learning schemes (i.e., regularization, replay, and replay with examples) on six datasets from three mobile and embedded sensing applications in a range of scenarios having different learning complexities. More specifically, we implement an end-to-end continual learning framework on edge devices. Then we investigate the generalizability, trade-offs between performance, storage, computational costs, and memory footprint of different continual learning methods. Our findings suggest that replay with exemplars-based schemes such as iCaRL has the best performance trade-offs, even in complex scenarios, at the expense of some storage space (few MBs) for training examples (1% to 5%). We also demonstrate for the first time that it is feasible and practical to run continual learning on-device with a limited memory budget. In particular, the latency on two types of mobile and embedded devices suggests that both incremental learning time (few seconds - 4 minutes) and training time (1 - 75 minutes) across datasets are acceptable, as training could happen on the device when the embedded device is charging thereby ensuring complete data privacy. Finally, we present some guidelines for practitioners who want to apply a continual learning paradigm for mobile sensing tasks.
{"title":"Exploring System Performance of Continual Learning for Mobile and Embedded Sensing Applications","authors":"Young D. Kwon, Jagmohan Chauhan, Abhishek Kumar, Pan Hui, C. Mascolo","doi":"10.1145/3453142.3491285","DOIUrl":"https://doi.org/10.1145/3453142.3491285","url":null,"abstract":"Continual learning approaches help deep neural network models adapt and learn incrementally by trying to solve catastrophic forgetting. However, whether these existing approaches, applied traditionally to image-based tasks, work with the same efficacy to the sequential time series data generated by mobile or embedded sensing systems remains an unanswered question. To address this void, we conduct the first comprehensive empirical study that quantifies the performance of three predominant continual learning schemes (i.e., regularization, replay, and replay with examples) on six datasets from three mobile and embedded sensing applications in a range of scenarios having different learning complexities. More specifically, we implement an end-to-end continual learning framework on edge devices. Then we investigate the generalizability, trade-offs between performance, storage, computational costs, and memory footprint of different continual learning methods. Our findings suggest that replay with exemplars-based schemes such as iCaRL has the best performance trade-offs, even in complex scenarios, at the expense of some storage space (few MBs) for training examples (1% to 5%). We also demonstrate for the first time that it is feasible and practical to run continual learning on-device with a limited memory budget. In particular, the latency on two types of mobile and embedded devices suggests that both incremental learning time (few seconds - 4 minutes) and training time (1 - 75 minutes) across datasets are acceptable, as training could happen on the device when the embedded device is charging thereby ensuring complete data privacy. Finally, we present some guidelines for practitioners who want to apply a continual learning paradigm for mobile sensing tasks.","PeriodicalId":6779,"journal":{"name":"2021 IEEE/ACM Symposium on Edge Computing (SEC)","volume":"35 2 1","pages":"319-332"},"PeriodicalIF":0.0,"publicationDate":"2021-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77689158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pinyarash Pinyoanuntapong, Tagore Pothuneedi, Ravikumar Balakrishnan, Minwoo Lee, Chen Chen, Pu Wang
Federated Learning (FL) over wireless multi-hop edge computing networks, i.e., multi-hop FL, is a cost-effective distributed on-device deep learning paradigm. This paper presents FedEdge simulator, a high-fidelity Linux-based simulator, which enables fast prototyping, sim-to-real code, and knowledge transfer for multi-hop FL systems. FedEdge simulator is built on top of the hardware-oriented FedEdge experimental framework with a new extension of the realistic physical layer emulator. This emulator exploits trace-based channel modeling and dynamic link scheduling to minimize the reality gap between the simulator and the physical testbed. Our initial experiments demonstrate the high fidelity of the FedEdge simulator and its superior performance on sim-to-real knowledge transfer in reinforcement learning -optimized multi-hop FL.
{"title":"Sim-to-Real Transfer in Multi-agent Reinforcement Networking for Federated Edge Computing","authors":"Pinyarash Pinyoanuntapong, Tagore Pothuneedi, Ravikumar Balakrishnan, Minwoo Lee, Chen Chen, Pu Wang","doi":"10.1145/3453142.3491419","DOIUrl":"https://doi.org/10.1145/3453142.3491419","url":null,"abstract":"Federated Learning (FL) over wireless multi-hop edge computing networks, i.e., multi-hop FL, is a cost-effective distributed on-device deep learning paradigm. This paper presents FedEdge simulator, a high-fidelity Linux-based simulator, which enables fast prototyping, sim-to-real code, and knowledge transfer for multi-hop FL systems. FedEdge simulator is built on top of the hardware-oriented FedEdge experimental framework with a new extension of the realistic physical layer emulator. This emulator exploits trace-based channel modeling and dynamic link scheduling to minimize the reality gap between the simulator and the physical testbed. Our initial experiments demonstrate the high fidelity of the FedEdge simulator and its superior performance on sim-to-real knowledge transfer in reinforcement learning -optimized multi-hop FL.","PeriodicalId":6779,"journal":{"name":"2021 IEEE/ACM Symposium on Edge Computing (SEC)","volume":"56 1","pages":"355-360"},"PeriodicalIF":0.0,"publicationDate":"2021-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77636669","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Live traffic analysis at the first aggregation point in the ISP network enables the implementation of complex traffic engineering policies but is limited by the scarce processing capabilities, especially for Deep Learning (DL) based analytics. The introduction of specialized hardware accelerators, offers the opportunity to enhance processing capabilities of network devices at the edge. Yet, no packet processing pipeline is capable of offering DL-based analysis capabilities in the data-plane, without interfering with network operations. In this paper, we present FENXI, a system to run complex analytics by leveraging Tensor Processing Unit (TPU). The design of FENXI decouples forwarding and traffic analytics which operates at different granularities i.e., packet and flow levels. We conceive two independent modules that asynchronously communicate to exchange network data and analytics results, and design data structures to extract flow level statistics without impacting per-packet processing. We prototyped and evaluated FENXI on general-purpose servers considering both adversarial and realistic network conditions. Our analysis shows that FENXI can sustain 100 Gbps line rate traffic processing requiring only limited resources, while also dynamically adapting to variable network conditions.
{"title":"FENXI: Deep-learning Traffic Analytics at the edge","authors":"Massimo Gallo, A. Finamore, G. Simon, D. Rossi","doi":"10.1145/3453142.3491273","DOIUrl":"https://doi.org/10.1145/3453142.3491273","url":null,"abstract":"Live traffic analysis at the first aggregation point in the ISP network enables the implementation of complex traffic engineering policies but is limited by the scarce processing capabilities, especially for Deep Learning (DL) based analytics. The introduction of specialized hardware accelerators, offers the opportunity to enhance processing capabilities of network devices at the edge. Yet, no packet processing pipeline is capable of offering DL-based analysis capabilities in the data-plane, without interfering with network operations. In this paper, we present FENXI, a system to run complex analytics by leveraging Tensor Processing Unit (TPU). The design of FENXI decouples forwarding and traffic analytics which operates at different granularities i.e., packet and flow levels. We conceive two independent modules that asynchronously communicate to exchange network data and analytics results, and design data structures to extract flow level statistics without impacting per-packet processing. We prototyped and evaluated FENXI on general-purpose servers considering both adversarial and realistic network conditions. Our analysis shows that FENXI can sustain 100 Gbps line rate traffic processing requiring only limited resources, while also dynamically adapting to variable network conditions.","PeriodicalId":6779,"journal":{"name":"2021 IEEE/ACM Symposium on Edge Computing (SEC)","volume":"32 1","pages":"202-213"},"PeriodicalIF":0.0,"publicationDate":"2021-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88768314","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhujun Xiao, Zhengxu Xia, Haitao Zheng, Ben Y. Zhao, Junchen Jiang
Edge video analytics is becoming the solution to many safety and management tasks. Its wide deployment, however, must first address the tension between inference accuracy and resource (compute/network) cost. This has led to the development of video analytics pipelines (VAPs), which reduce resource cost by combining deep neural network compression and speedup techniques with video processing heuristics. Our measurement study, however, shows that today's methods for evaluating VAPs are incomplete, often producing premature conclusions or ambiguous results. This is because each VAP's performance varies largely across videos and time, and is sensitive to different subsets of video content characteristics. We argue that accurate VAP evaluation must first characterize the complex interaction between VAPs and video characteristics, which we refer to as VAP performance clarity. Following this concept, we design and implement Yoda, the first VAP benchmark to achieve performance clarity. Using primitive-based profiling and a carefully curated bench-mark video set, Yoda builds a performance clarity profile for each VAP to precisely define its accuracy vs. cost trade-off and its relationship with video characteristics. We show that Yoda substantially improves VAP evaluations by (1) providing a comprehensive, transparent assessment of VAP performance and its dependencies on video characteristics; (2) explicitly identifying fine-grained VAP behaviors that were previously hidden by large performance variance; and (3) revealing strengths/weaknesses among different VAPs and new design opportunities.
{"title":"Towards Performance Clarity of Edge Video Analytics","authors":"Zhujun Xiao, Zhengxu Xia, Haitao Zheng, Ben Y. Zhao, Junchen Jiang","doi":"10.1145/3453142.3491272","DOIUrl":"https://doi.org/10.1145/3453142.3491272","url":null,"abstract":"Edge video analytics is becoming the solution to many safety and management tasks. Its wide deployment, however, must first address the tension between inference accuracy and resource (compute/network) cost. This has led to the development of video analytics pipelines (VAPs), which reduce resource cost by combining deep neural network compression and speedup techniques with video processing heuristics. Our measurement study, however, shows that today's methods for evaluating VAPs are incomplete, often producing premature conclusions or ambiguous results. This is because each VAP's performance varies largely across videos and time, and is sensitive to different subsets of video content characteristics. We argue that accurate VAP evaluation must first characterize the complex interaction between VAPs and video characteristics, which we refer to as VAP performance clarity. Following this concept, we design and implement Yoda, the first VAP benchmark to achieve performance clarity. Using primitive-based profiling and a carefully curated bench-mark video set, Yoda builds a performance clarity profile for each VAP to precisely define its accuracy vs. cost trade-off and its relationship with video characteristics. We show that Yoda substantially improves VAP evaluations by (1) providing a comprehensive, transparent assessment of VAP performance and its dependencies on video characteristics; (2) explicitly identifying fine-grained VAP behaviors that were previously hidden by large performance variance; and (3) revealing strengths/weaknesses among different VAPs and new design opportunities.","PeriodicalId":6779,"journal":{"name":"2021 IEEE/ACM Symposium on Edge Computing (SEC)","volume":"23 1","pages":"148-164"},"PeriodicalIF":0.0,"publicationDate":"2021-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75899202","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The ubiquity of smartphone cameras and IoT cameras, together with the recent boom of deep learning and deep neural networks, proliferate various computer vision driven mobile and IoT applications deployed on the edge. This paper focuses on applications which make soft real time requests to perform inference on their data - they desire prompt responses within designated deadlines, but occasional deadline misses are acceptable. Supporting soft real time applications on a multi-tenant edge server is not easy, since the requests sharing the limited GPU computing resources of an edge server interfere with each other. In order to tackle this problem, we comprehensively evaluate how latency and throughput respond to different GPU execution plans. Based on this analysis, we propose a GPU scheduler, DeepRT, which provides latency guarantee to the requests while maintaining high overall system throughput. The key component of DeepRT, DisBatcher, batches data from different requests as much as possible while it is proven to provide latency guarantee for requests admitted by an Admission Control Module. DeepRT also includes an Adaptation Module which tackles overruns. Our evaluation results show that DeepRT outperforms state-of-the-art works in terms of the number of deadline misses and throughput.
{"title":"DeepRT: A Soft Real Time Scheduler for Computer Vision Applications on the Edge","authors":"Zhe Yang, K. Nahrstedt, Hongpeng Guo, Qian Zhou","doi":"10.1145/3453142.3491278","DOIUrl":"https://doi.org/10.1145/3453142.3491278","url":null,"abstract":"The ubiquity of smartphone cameras and IoT cameras, together with the recent boom of deep learning and deep neural networks, proliferate various computer vision driven mobile and IoT applications deployed on the edge. This paper focuses on applications which make soft real time requests to perform inference on their data - they desire prompt responses within designated deadlines, but occasional deadline misses are acceptable. Supporting soft real time applications on a multi-tenant edge server is not easy, since the requests sharing the limited GPU computing resources of an edge server interfere with each other. In order to tackle this problem, we comprehensively evaluate how latency and throughput respond to different GPU execution plans. Based on this analysis, we propose a GPU scheduler, DeepRT, which provides latency guarantee to the requests while maintaining high overall system throughput. The key component of DeepRT, DisBatcher, batches data from different requests as much as possible while it is proven to provide latency guarantee for requests admitted by an Admission Control Module. DeepRT also includes an Adaptation Module which tackles overruns. Our evaluation results show that DeepRT outperforms state-of-the-art works in terms of the number of deadline misses and throughput.","PeriodicalId":6779,"journal":{"name":"2021 IEEE/ACM Symposium on Edge Computing (SEC)","volume":"58 1","pages":"271-284"},"PeriodicalIF":0.0,"publicationDate":"2021-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76495734","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lorena Qendro, Jagmohan Chauhan, Alberto Gil C. P. Ramos, C. Mascolo
Neural networks (NNs) have drastically improved the performance of mobile and embedded applications but lack measures of “reliability” estimation that would enable reasoning over their predictions. Despite the vital importance, especially in areas of human well-being and health, state-of-the-art uncertainty estimation techniques are computationally expensive when applied to resource-constrained devices. We propose an efficient framework for predictive uncertainty estimation in NNs deployed on edge computing platforms with no need for fine-tuning or re-training strategies. To meet the energy and latency requirements of these systems the framework is built from the ground up to provide predictive uncertainty based only on one forward pass and a negligible amount of additional matrix multiplications. Our aim is to enable already trained deep learning models to generate uncertainty estimates on resource-limited devices at inference time focusing on classification tasks. This framework is founded on theoretical developments casting dropout training as approximate inference in Bayesian NNs. Our novel layerwise distribution approximation to the convolution layer cascades through the network, providing uncertainty estimates in one single run which ensures minimal overhead, especially compared with uncertainty techniques that require multiple forwards passes and an equal linear rise in energy and latency requirements making them unsuitable in practice. We demonstrate that it yields better performance and flexibility over previous work based on multilayer perceptrons to obtain uncertainty estimates. Our evaluation with mobile applications datasets on Nvidia Jetson TX2 and Nano shows that our approach not only obtains robust and accurate uncertainty estimations but also outperforms state-of-the-art methods in terms of systems performance, reducing energy consumption (up to 28–folds), keeping the memory overhead at a minimum while still improving accuracy (up to 16%).
{"title":"The Benefit of the Doubt: Uncertainty Aware Sensing for Edge Computing Platforms","authors":"Lorena Qendro, Jagmohan Chauhan, Alberto Gil C. P. Ramos, C. Mascolo","doi":"10.1145/3453142.3492330","DOIUrl":"https://doi.org/10.1145/3453142.3492330","url":null,"abstract":"Neural networks (NNs) have drastically improved the performance of mobile and embedded applications but lack measures of “reliability” estimation that would enable reasoning over their predictions. Despite the vital importance, especially in areas of human well-being and health, state-of-the-art uncertainty estimation techniques are computationally expensive when applied to resource-constrained devices. We propose an efficient framework for predictive uncertainty estimation in NNs deployed on edge computing platforms with no need for fine-tuning or re-training strategies. To meet the energy and latency requirements of these systems the framework is built from the ground up to provide predictive uncertainty based only on one forward pass and a negligible amount of additional matrix multiplications. Our aim is to enable already trained deep learning models to generate uncertainty estimates on resource-limited devices at inference time focusing on classification tasks. This framework is founded on theoretical developments casting dropout training as approximate inference in Bayesian NNs. Our novel layerwise distribution approximation to the convolution layer cascades through the network, providing uncertainty estimates in one single run which ensures minimal overhead, especially compared with uncertainty techniques that require multiple forwards passes and an equal linear rise in energy and latency requirements making them unsuitable in practice. We demonstrate that it yields better performance and flexibility over previous work based on multilayer perceptrons to obtain uncertainty estimates. Our evaluation with mobile applications datasets on Nvidia Jetson TX2 and Nano shows that our approach not only obtains robust and accurate uncertainty estimations but also outperforms state-of-the-art methods in terms of systems performance, reducing energy consumption (up to 28–folds), keeping the memory overhead at a minimum while still improving accuracy (up to 16%).","PeriodicalId":6779,"journal":{"name":"2021 IEEE/ACM Symposium on Edge Computing (SEC)","volume":"1 1","pages":"214-227"},"PeriodicalIF":0.0,"publicationDate":"2021-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88686659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sibendu Paul, Utsav Drolia, Y. C. Hu, S. Chakradhar
Millions of cameras at edge are being deployed to power a variety of different deep learning applications. However, the frames captured by these cameras are not always pristine - they can be distorted due to lighting issues, sensor noise, compression etc. Such distortions not only deteriorate visual quality, they impact the accuracy of deep learning applications that process such video streams. In this work, we introduce AQuA, to protect application accuracy against such distorted frames by scoring the level of distortion in the frames. It takes into account the analytical quality of frames, not the visual quality, by learning a novel metric, classifier opinion score, and uses a lightweight, CNN-based, object-independent feature extractor. AQuA accurately scores distortion levels of frames and generalizes to multiple different deep learning applications. When used for filtering poor quality frames at edge, it reduces high-confidence errors for analytics applications by 17%. Through filtering, and due to its low overhead (14ms), AQuA can also reduce computation time and average bandwidth usage by 25%.
{"title":"AQuA: Analytical Quality Assessment for Optimizing Video Analytics Systems","authors":"Sibendu Paul, Utsav Drolia, Y. C. Hu, S. Chakradhar","doi":"10.1145/3453142.3491279","DOIUrl":"https://doi.org/10.1145/3453142.3491279","url":null,"abstract":"Millions of cameras at edge are being deployed to power a variety of different deep learning applications. However, the frames captured by these cameras are not always pristine - they can be distorted due to lighting issues, sensor noise, compression etc. Such distortions not only deteriorate visual quality, they impact the accuracy of deep learning applications that process such video streams. In this work, we introduce AQuA, to protect application accuracy against such distorted frames by scoring the level of distortion in the frames. It takes into account the analytical quality of frames, not the visual quality, by learning a novel metric, classifier opinion score, and uses a lightweight, CNN-based, object-independent feature extractor. AQuA accurately scores distortion levels of frames and generalizes to multiple different deep learning applications. When used for filtering poor quality frames at edge, it reduces high-confidence errors for analytics applications by 17%. Through filtering, and due to its low overhead (14ms), AQuA can also reduce computation time and average bandwidth usage by 25%.","PeriodicalId":6779,"journal":{"name":"2021 IEEE/ACM Symposium on Edge Computing (SEC)","volume":"28 1","pages":"135-147"},"PeriodicalIF":0.0,"publicationDate":"2021-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89560383","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The obfuscated images on envelopes are believed to be secure and have been widely used to protect the information contained in a mail. In this paper, we present a new algorithm that can conduct character recognition from obfuscated images. Specifically, by using a transfer learning method, we prove that an attacker can effectively recognize the letter without unfolding the envelope. We believe that the presented method reveals the potential threat to current postal services. To defend against the proposed attack, we introduce a context-related shader to prevent such threats from occurring.
{"title":"MailLeak: Obfuscation-Robust Character Extraction Using Transfer Learning","authors":"Wei Wang, Emily Sallenback, Zeyu Ning, Hugues Nelson Iradukunda, Wenxing Lu, Qingquan Zhang, Ting Zhu","doi":"10.1145/3453142.3491421","DOIUrl":"https://doi.org/10.1145/3453142.3491421","url":null,"abstract":"The obfuscated images on envelopes are believed to be secure and have been widely used to protect the information contained in a mail. In this paper, we present a new algorithm that can conduct character recognition from obfuscated images. Specifically, by using a transfer learning method, we prove that an attacker can effectively recognize the letter without unfolding the envelope. We believe that the presented method reveals the potential threat to current postal services. To defend against the proposed attack, we introduce a context-related shader to prevent such threats from occurring.","PeriodicalId":6779,"journal":{"name":"2021 IEEE/ACM Symposium on Edge Computing (SEC)","volume":"152 6 1","pages":"459-464"},"PeriodicalIF":0.0,"publicationDate":"2020-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83169151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ayan Chakrabarti, Roch Guérin, Chenyang Lu, Jiangnan Liu
We consider an edge-computing setting where machine learning-based algorithms are used for real-time classification of inputs acquired by devices, e.g., cameras. Computational resources on the devices are constrained, and therefore only capable of running machine learning models of limited accuracy. A subset of inputs can be offloaded to the edge for processing by a more accurate but resource-intensive machine learning model. Both models process inputs with low-latency, but offloading incurs network delays. To manage these delays and meet application deadlines, a token bucket constrains transmissions from the device. We introduce a Markov Decision Process-based framework to make offload decisions under such constraints. Decisions are based on the local model's confidence and the token bucket state, with the goal of minimizing a specified error measure for the application. We extend the approach to configurations involving multiple devices connected to the same access switch to realize the benefits of a shared token bucket. We evaluate and analyze the policies derived using our framework on the standard ImageNet image classification benchmark.
{"title":"Real-Time Edge Classification: Optimal Offloading under Token Bucket Constraints","authors":"Ayan Chakrabarti, Roch Guérin, Chenyang Lu, Jiangnan Liu","doi":"10.1145/3453142.3492329","DOIUrl":"https://doi.org/10.1145/3453142.3492329","url":null,"abstract":"We consider an edge-computing setting where machine learning-based algorithms are used for real-time classification of inputs acquired by devices, e.g., cameras. Computational resources on the devices are constrained, and therefore only capable of running machine learning models of limited accuracy. A subset of inputs can be offloaded to the edge for processing by a more accurate but resource-intensive machine learning model. Both models process inputs with low-latency, but offloading incurs network delays. To manage these delays and meet application deadlines, a token bucket constrains transmissions from the device. We introduce a Markov Decision Process-based framework to make offload decisions under such constraints. Decisions are based on the local model's confidence and the token bucket state, with the goal of minimizing a specified error measure for the application. We extend the approach to configurations involving multiple devices connected to the same access switch to realize the benefits of a shared token bucket. We evaluate and analyze the policies derived using our framework on the standard ImageNet image classification benchmark.","PeriodicalId":6779,"journal":{"name":"2021 IEEE/ACM Symposium on Edge Computing (SEC)","volume":"16 1","pages":"41-54"},"PeriodicalIF":0.0,"publicationDate":"2020-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79307851","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
How to accurately and efficiently label data on a mobile device is critical for the success of training machine learning models on mobile devices. Auto-labeling data on mobile devices is challenging, because data is incrementally generated and there is a possibility of having unknown labels among new coming data. Furthermore, the rich hardware heterogeneity on mobile devices creates challenges on efficiently executing the auto-labeling workload. In this paper, we introduce Flame, an auto-labeling system that can label dynamically generated data with unknown labels. Flame includes an execution engine that efficiently schedules and executes auto-labeling workloads on heterogeneous mobile processors. Evaluating Flame with six datasets on two mobile devices, we demonstrate that the labeling accuracy of Flame is 11.8%, 16.1%, 18.5%, and 25.2% higher than a state-of-the-art labeling method, transfer learning, semi-supervised learning, and boosting methods respectively. Flame is also energy efficient, it consumes only 328.65mJ and 414.84mJ when labeling 500 data instances on Samsung S9 and Google Pixel2 respectively. Furthermore, running Flame on mobile devices only brings about 0.75 ms additional frame latency which is imperceivable by the users.
{"title":"Flame: A Self-Adaptive Auto-Labeling System for Heterogeneous Mobile Processors","authors":"Jie Liu, Jiawen Liu, Zhen Xie, Dong Li","doi":"10.1145/3453142.3493611","DOIUrl":"https://doi.org/10.1145/3453142.3493611","url":null,"abstract":"How to accurately and efficiently label data on a mobile device is critical for the success of training machine learning models on mobile devices. Auto-labeling data on mobile devices is challenging, because data is incrementally generated and there is a possibility of having unknown labels among new coming data. Furthermore, the rich hardware heterogeneity on mobile devices creates challenges on efficiently executing the auto-labeling workload. In this paper, we introduce Flame, an auto-labeling system that can label dynamically generated data with unknown labels. Flame includes an execution engine that efficiently schedules and executes auto-labeling workloads on heterogeneous mobile processors. Evaluating Flame with six datasets on two mobile devices, we demonstrate that the labeling accuracy of Flame is 11.8%, 16.1%, 18.5%, and 25.2% higher than a state-of-the-art labeling method, transfer learning, semi-supervised learning, and boosting methods respectively. Flame is also energy efficient, it consumes only 328.65mJ and 414.84mJ when labeling 500 data instances on Samsung S9 and Google Pixel2 respectively. Furthermore, running Flame on mobile devices only brings about 0.75 ms additional frame latency which is imperceivable by the users.","PeriodicalId":6779,"journal":{"name":"2021 IEEE/ACM Symposium on Edge Computing (SEC)","volume":"16 1","pages":"80-93"},"PeriodicalIF":0.0,"publicationDate":"2020-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82669409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}