2021 IEEE/ACM Symposium on Edge Computing (SEC)最新文献_第5页

Exploring System Performance of Continual Learning for Mobile and Embedded Sensing Applications 探索移动和嵌入式传感应用中持续学习的系统性能

2021 IEEE/ACM Symposium on Edge Computing (SEC)

Pub Date : 2021-10-25 DOI: 10.1145/3453142.3491285

Young D. Kwon, Jagmohan Chauhan, Abhishek Kumar, Pan Hui, C. Mascolo

Continual learning approaches help deep neural network models adapt and learn incrementally by trying to solve catastrophic forgetting. However, whether these existing approaches, applied traditionally to image-based tasks, work with the same efficacy to the sequential time series data generated by mobile or embedded sensing systems remains an unanswered question. To address this void, we conduct the first comprehensive empirical study that quantifies the performance of three predominant continual learning schemes (i.e., regularization, replay, and replay with examples) on six datasets from three mobile and embedded sensing applications in a range of scenarios having different learning complexities. More specifically, we implement an end-to-end continual learning framework on edge devices. Then we investigate the generalizability, trade-offs between performance, storage, computational costs, and memory footprint of different continual learning methods. Our findings suggest that replay with exemplars-based schemes such as iCaRL has the best performance trade-offs, even in complex scenarios, at the expense of some storage space (few MBs) for training examples (1% to 5%). We also demonstrate for the first time that it is feasible and practical to run continual learning on-device with a limited memory budget. In particular, the latency on two types of mobile and embedded devices suggests that both incremental learning time (few seconds - 4 minutes) and training time (1 - 75 minutes) across datasets are acceptable, as training could happen on the device when the embedded device is charging thereby ensuring complete data privacy. Finally, we present some guidelines for practitioners who want to apply a continual learning paradigm for mobile sensing tasks.

持续学习方法通过尝试解决灾难性遗忘，帮助深度神经网络模型逐步适应和学习。然而，这些传统上应用于基于图像的任务的现有方法是否对移动或嵌入式传感系统生成的时序数据具有相同的功效仍然是一个悬而未决的问题。为了解决这一空白，我们进行了第一次全面的实证研究，量化了三种主要的持续学习方案(即正则化、重播和带示例的重播)在来自三种移动和嵌入式传感应用的六个数据集上的性能，这些数据集在一系列具有不同学习复杂性的场景中。更具体地说，我们在边缘设备上实现了端到端的持续学习框架。然后我们研究了不同连续学习方法的通用性、性能、存储、计算成本和内存占用之间的权衡。我们的研究结果表明，即使在复杂的场景中，使用基于示例的方案(如iCaRL)进行回放也具有最佳的性能权衡，代价是为训练示例(1%至5%)牺牲一些存储空间(几mb)。我们还首次证明了在有限内存预算的设备上运行持续学习是可行和实用的。特别是，两种类型的移动和嵌入式设备上的延迟表明，跨数据集的增量学习时间(几秒- 4分钟)和训练时间(1 - 75分钟)都是可以接受的，因为当嵌入式设备充电时，训练可以在设备上进行，从而确保完整的数据隐私。最后，我们为想要应用移动传感任务的持续学习范式的从业者提供了一些指导方针。

{"title":"Exploring System Performance of Continual Learning for Mobile and Embedded Sensing Applications","authors":"Young D. Kwon, Jagmohan Chauhan, Abhishek Kumar, Pan Hui, C. Mascolo","doi":"10.1145/3453142.3491285","DOIUrl":"https://doi.org/10.1145/3453142.3491285","url":null,"abstract":"Continual learning approaches help deep neural network models adapt and learn incrementally by trying to solve catastrophic forgetting. However, whether these existing approaches, applied traditionally to image-based tasks, work with the same efficacy to the sequential time series data generated by mobile or embedded sensing systems remains an unanswered question. To address this void, we conduct the first comprehensive empirical study that quantifies the performance of three predominant continual learning schemes (i.e., regularization, replay, and replay with examples) on six datasets from three mobile and embedded sensing applications in a range of scenarios having different learning complexities. More specifically, we implement an end-to-end continual learning framework on edge devices. Then we investigate the generalizability, trade-offs between performance, storage, computational costs, and memory footprint of different continual learning methods. Our findings suggest that replay with exemplars-based schemes such as iCaRL has the best performance trade-offs, even in complex scenarios, at the expense of some storage space (few MBs) for training examples (1% to 5%). We also demonstrate for the first time that it is feasible and practical to run continual learning on-device with a limited memory budget. In particular, the latency on two types of mobile and embedded devices suggests that both incremental learning time (few seconds - 4 minutes) and training time (1 - 75 minutes) across datasets are acceptable, as training could happen on the device when the embedded device is charging thereby ensuring complete data privacy. Finally, we present some guidelines for practitioners who want to apply a continual learning paradigm for mobile sensing tasks.","PeriodicalId":6779,"journal":{"name":"2021 IEEE/ACM Symposium on Edge Computing (SEC)","volume":"35 2 1","pages":"319-332"},"PeriodicalIF":0.0,"publicationDate":"2021-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77689158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 17

Sim-to-Real Transfer in Multi-agent Reinforcement Networking for Federated Edge Computing 面向联邦边缘计算的多智能体强化网络中的模拟到真实传输

2021 IEEE/ACM Symposium on Edge Computing (SEC)

Pub Date : 2021-10-18 DOI: 10.1145/3453142.3491419

Pinyarash Pinyoanuntapong, Tagore Pothuneedi, Ravikumar Balakrishnan, Minwoo Lee, Chen Chen, Pu Wang

Federated Learning (FL) over wireless multi-hop edge computing networks, i.e., multi-hop FL, is a cost-effective distributed on-device deep learning paradigm. This paper presents FedEdge simulator, a high-fidelity Linux-based simulator, which enables fast prototyping, sim-to-real code, and knowledge transfer for multi-hop FL systems. FedEdge simulator is built on top of the hardware-oriented FedEdge experimental framework with a new extension of the realistic physical layer emulator. This emulator exploits trace-based channel modeling and dynamic link scheduling to minimize the reality gap between the simulator and the physical testbed. Our initial experiments demonstrate the high fidelity of the FedEdge simulator and its superior performance on sim-to-real knowledge transfer in reinforcement learning -optimized multi-hop FL.

基于无线多跳边缘计算网络的联邦学习(FL)，即多跳FL，是一种具有成本效益的分布式设备上深度学习范式。本文介绍了FedEdge模拟器，一个基于linux的高保真模拟器，它可以实现多跳FL系统的快速原型设计，模拟到真实的代码和知识转移。FedEdge模拟器建立在面向硬件的FedEdge实验框架之上，并对现实物理层模拟器进行了新的扩展。该仿真器利用基于跟踪的信道建模和动态链路调度来最小化仿真器与物理测试平台之间的现实差距。我们的初步实验证明了FedEdge模拟器的高保真度及其在强化学习优化多跳FL中的模拟到真实知识转移方面的优越性能。

引用次数: 1

FENXI: Deep-learning Traffic Analytics at the edge FENXI:边缘的深度学习流量分析

2021 IEEE/ACM Symposium on Edge Computing (SEC)

Pub Date : 2021-05-25 DOI: 10.1145/3453142.3491273

Massimo Gallo, A. Finamore, G. Simon, D. Rossi

Live traffic analysis at the first aggregation point in the ISP network enables the implementation of complex traffic engineering policies but is limited by the scarce processing capabilities, especially for Deep Learning (DL) based analytics. The introduction of specialized hardware accelerators, offers the opportunity to enhance processing capabilities of network devices at the edge. Yet, no packet processing pipeline is capable of offering DL-based analysis capabilities in the data-plane, without interfering with network operations. In this paper, we present FENXI, a system to run complex analytics by leveraging Tensor Processing Unit (TPU). The design of FENXI decouples forwarding and traffic analytics which operates at different granularities i.e., packet and flow levels. We conceive two independent modules that asynchronously communicate to exchange network data and analytics results, and design data structures to extract flow level statistics without impacting per-packet processing. We prototyped and evaluated FENXI on general-purpose servers considering both adversarial and realistic network conditions. Our analysis shows that FENXI can sustain 100 Gbps line rate traffic processing requiring only limited resources, while also dynamically adapting to variable network conditions.

在ISP网络的第一个汇聚点进行实时流量分析可以实现复杂的流量工程策略，但受到稀缺处理能力的限制，特别是基于深度学习(DL)的分析。专用硬件加速器的引入为增强边缘网络设备的处理能力提供了机会。然而，没有数据包处理管道能够在不干扰网络操作的情况下，在数据平面上提供基于dl的分析功能。在本文中，我们介绍了FENXI，一个利用张量处理单元(TPU)运行复杂分析的系统。FENXI的设计将转发和流量分析解耦，它们在不同的粒度(即数据包和流级别)上运行。我们设想了两个独立的模块，它们异步通信以交换网络数据和分析结果，并设计了数据结构以在不影响每包处理的情况下提取流量统计数据。我们在通用服务器上对FENXI进行了原型设计和评估，同时考虑了对抗性和现实网络条件。我们的分析表明，FENXI仅需要有限的资源就可以维持100 Gbps的线路速率流量处理，同时还可以动态适应可变的网络条件。

{"title":"FENXI: Deep-learning Traffic Analytics at the edge","authors":"Massimo Gallo, A. Finamore, G. Simon, D. Rossi","doi":"10.1145/3453142.3491273","DOIUrl":"https://doi.org/10.1145/3453142.3491273","url":null,"abstract":"Live traffic analysis at the first aggregation point in the ISP network enables the implementation of complex traffic engineering policies but is limited by the scarce processing capabilities, especially for Deep Learning (DL) based analytics. The introduction of specialized hardware accelerators, offers the opportunity to enhance processing capabilities of network devices at the edge. Yet, no packet processing pipeline is capable of offering DL-based analysis capabilities in the data-plane, without interfering with network operations. In this paper, we present FENXI, a system to run complex analytics by leveraging Tensor Processing Unit (TPU). The design of FENXI decouples forwarding and traffic analytics which operates at different granularities i.e., packet and flow levels. We conceive two independent modules that asynchronously communicate to exchange network data and analytics results, and design data structures to extract flow level statistics without impacting per-packet processing. We prototyped and evaluated FENXI on general-purpose servers considering both adversarial and realistic network conditions. Our analysis shows that FENXI can sustain 100 Gbps line rate traffic processing requiring only limited resources, while also dynamically adapting to variable network conditions.","PeriodicalId":6779,"journal":{"name":"2021 IEEE/ACM Symposium on Edge Computing (SEC)","volume":"32 1","pages":"202-213"},"PeriodicalIF":0.0,"publicationDate":"2021-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88768314","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Towards Performance Clarity of Edge Video Analytics 走向边缘视频分析的性能清晰度

2021 IEEE/ACM Symposium on Edge Computing (SEC)

Pub Date : 2021-05-18 DOI: 10.1145/3453142.3491272

Zhujun Xiao, Zhengxu Xia, Haitao Zheng, Ben Y. Zhao, Junchen Jiang

Edge video analytics is becoming the solution to many safety and management tasks. Its wide deployment, however, must first address the tension between inference accuracy and resource (compute/network) cost. This has led to the development of video analytics pipelines (VAPs), which reduce resource cost by combining deep neural network compression and speedup techniques with video processing heuristics. Our measurement study, however, shows that today's methods for evaluating VAPs are incomplete, often producing premature conclusions or ambiguous results. This is because each VAP's performance varies largely across videos and time, and is sensitive to different subsets of video content characteristics. We argue that accurate VAP evaluation must first characterize the complex interaction between VAPs and video characteristics, which we refer to as VAP performance clarity. Following this concept, we design and implement Yoda, the first VAP benchmark to achieve performance clarity. Using primitive-based profiling and a carefully curated bench-mark video set, Yoda builds a performance clarity profile for each VAP to precisely define its accuracy vs. cost trade-off and its relationship with video characteristics. We show that Yoda substantially improves VAP evaluations by (1) providing a comprehensive, transparent assessment of VAP performance and its dependencies on video characteristics; (2) explicitly identifying fine-grained VAP behaviors that were previously hidden by large performance variance; and (3) revealing strengths/weaknesses among different VAPs and new design opportunities.

边缘视频分析正在成为许多安全和管理任务的解决方案。然而，它的广泛部署必须首先解决推理准确性和资源(计算/网络)成本之间的紧张关系。这导致了视频分析管道(VAPs)的发展，它通过将深度神经网络压缩和加速技术与视频处理启发式相结合来降低资源成本。然而，我们的测量研究表明，目前评估vap的方法是不完整的，经常产生过早的结论或模糊的结果。这是因为每个VAP的性能在视频和时间上有很大差异，并且对视频内容特征的不同子集很敏感。我们认为，准确的VAP评估必须首先表征VAP与视频特征之间复杂的相互作用，我们将其称为VAP性能清晰度。遵循这个概念，我们设计并实现了Yoda，这是第一个实现性能清晰度的VAP基准。使用基于原语的分析和精心策划的基准视频集，Yoda为每个VAP构建了性能清晰度配置文件，以精确定义其准确性与成本权衡及其与视频特征的关系。我们发现Yoda通过(1)提供对VAP性能及其对视频特性的依赖性的全面、透明的评估，大大改善了VAP评估;(2)明确识别细粒度的VAP行为，这些行为之前被大的性能差异所隐藏;(3)揭示不同vap之间的优缺点和新的设计机会。

{"title":"Towards Performance Clarity of Edge Video Analytics","authors":"Zhujun Xiao, Zhengxu Xia, Haitao Zheng, Ben Y. Zhao, Junchen Jiang","doi":"10.1145/3453142.3491272","DOIUrl":"https://doi.org/10.1145/3453142.3491272","url":null,"abstract":"Edge video analytics is becoming the solution to many safety and management tasks. Its wide deployment, however, must first address the tension between inference accuracy and resource (compute/network) cost. This has led to the development of video analytics pipelines (VAPs), which reduce resource cost by combining deep neural network compression and speedup techniques with video processing heuristics. Our measurement study, however, shows that today's methods for evaluating VAPs are incomplete, often producing premature conclusions or ambiguous results. This is because each VAP's performance varies largely across videos and time, and is sensitive to different subsets of video content characteristics. We argue that accurate VAP evaluation must first characterize the complex interaction between VAPs and video characteristics, which we refer to as VAP performance clarity. Following this concept, we design and implement Yoda, the first VAP benchmark to achieve performance clarity. Using primitive-based profiling and a carefully curated bench-mark video set, Yoda builds a performance clarity profile for each VAP to precisely define its accuracy vs. cost trade-off and its relationship with video characteristics. We show that Yoda substantially improves VAP evaluations by (1) providing a comprehensive, transparent assessment of VAP performance and its dependencies on video characteristics; (2) explicitly identifying fine-grained VAP behaviors that were previously hidden by large performance variance; and (3) revealing strengths/weaknesses among different VAPs and new design opportunities.","PeriodicalId":6779,"journal":{"name":"2021 IEEE/ACM Symposium on Edge Computing (SEC)","volume":"23 1","pages":"148-164"},"PeriodicalIF":0.0,"publicationDate":"2021-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75899202","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 20

DeepRT: A Soft Real Time Scheduler for Computer Vision Applications on the Edge DeepRT:用于边缘计算机视觉应用的软实时调度程序

2021 IEEE/ACM Symposium on Edge Computing (SEC)

Pub Date : 2021-05-05 DOI: 10.1145/3453142.3491278

Zhe Yang, K. Nahrstedt, Hongpeng Guo, Qian Zhou

The ubiquity of smartphone cameras and IoT cameras, together with the recent boom of deep learning and deep neural networks, proliferate various computer vision driven mobile and IoT applications deployed on the edge. This paper focuses on applications which make soft real time requests to perform inference on their data - they desire prompt responses within designated deadlines, but occasional deadline misses are acceptable. Supporting soft real time applications on a multi-tenant edge server is not easy, since the requests sharing the limited GPU computing resources of an edge server interfere with each other. In order to tackle this problem, we comprehensively evaluate how latency and throughput respond to different GPU execution plans. Based on this analysis, we propose a GPU scheduler, DeepRT, which provides latency guarantee to the requests while maintaining high overall system throughput. The key component of DeepRT, DisBatcher, batches data from different requests as much as possible while it is proven to provide latency guarantee for requests admitted by an Admission Control Module. DeepRT also includes an Adaptation Module which tackles overruns. Our evaluation results show that DeepRT outperforms state-of-the-art works in terms of the number of deadline misses and throughput.

智能手机摄像头和物联网摄像头无处不在，加上最近深度学习和深度神经网络的蓬勃发展，在边缘部署了各种计算机视觉驱动的移动和物联网应用程序。本文关注的是软实时请求对其数据进行推理的应用程序——它们希望在指定的截止日期内得到及时的响应，但偶尔的截止日期错过是可以接受的。在多租户边缘服务器上支持软实时应用程序并不容易，因为共享边缘服务器有限GPU计算资源的请求会相互干扰。为了解决这个问题，我们全面评估了延迟和吞吐量对不同GPU执行计划的响应。在此基础上，我们提出了一种GPU调度器DeepRT，该调度器在保证请求延迟的同时保持较高的整体系统吞吐量。DeepRT的关键组件DisBatcher尽可能多地对来自不同请求的数据进行批处理，同时它被证明可以为允许控制模块接收的请求提供延迟保证。DeepRT还包括一个适应模块，用于处理溢出。我们的评估结果表明，DeepRT在错过截止日期的次数和吞吐量方面优于最先进的作品。

{"title":"DeepRT: A Soft Real Time Scheduler for Computer Vision Applications on the Edge","authors":"Zhe Yang, K. Nahrstedt, Hongpeng Guo, Qian Zhou","doi":"10.1145/3453142.3491278","DOIUrl":"https://doi.org/10.1145/3453142.3491278","url":null,"abstract":"The ubiquity of smartphone cameras and IoT cameras, together with the recent boom of deep learning and deep neural networks, proliferate various computer vision driven mobile and IoT applications deployed on the edge. This paper focuses on applications which make soft real time requests to perform inference on their data - they desire prompt responses within designated deadlines, but occasional deadline misses are acceptable. Supporting soft real time applications on a multi-tenant edge server is not easy, since the requests sharing the limited GPU computing resources of an edge server interfere with each other. In order to tackle this problem, we comprehensively evaluate how latency and throughput respond to different GPU execution plans. Based on this analysis, we propose a GPU scheduler, DeepRT, which provides latency guarantee to the requests while maintaining high overall system throughput. The key component of DeepRT, DisBatcher, batches data from different requests as much as possible while it is proven to provide latency guarantee for requests admitted by an Admission Control Module. DeepRT also includes an Adaptation Module which tackles overruns. Our evaluation results show that DeepRT outperforms state-of-the-art works in terms of the number of deadline misses and throughput.","PeriodicalId":6779,"journal":{"name":"2021 IEEE/ACM Symposium on Edge Computing (SEC)","volume":"58 1","pages":"271-284"},"PeriodicalIF":0.0,"publicationDate":"2021-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76495734","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

The Benefit of the Doubt: Uncertainty Aware Sensing for Edge Computing Platforms 怀疑的好处:边缘计算平台的不确定性感知传感

2021 IEEE/ACM Symposium on Edge Computing (SEC)

Pub Date : 2021-02-11 DOI: 10.1145/3453142.3492330

Lorena Qendro, Jagmohan Chauhan, Alberto Gil C. P. Ramos, C. Mascolo

Neural networks (NNs) have drastically improved the performance of mobile and embedded applications but lack measures of “reliability” estimation that would enable reasoning over their predictions. Despite the vital importance, especially in areas of human well-being and health, state-of-the-art uncertainty estimation techniques are computationally expensive when applied to resource-constrained devices. We propose an efficient framework for predictive uncertainty estimation in NNs deployed on edge computing platforms with no need for fine-tuning or re-training strategies. To meet the energy and latency requirements of these systems the framework is built from the ground up to provide predictive uncertainty based only on one forward pass and a negligible amount of additional matrix multiplications. Our aim is to enable already trained deep learning models to generate uncertainty estimates on resource-limited devices at inference time focusing on classification tasks. This framework is founded on theoretical developments casting dropout training as approximate inference in Bayesian NNs. Our novel layerwise distribution approximation to the convolution layer cascades through the network, providing uncertainty estimates in one single run which ensures minimal overhead, especially compared with uncertainty techniques that require multiple forwards passes and an equal linear rise in energy and latency requirements making them unsuitable in practice. We demonstrate that it yields better performance and flexibility over previous work based on multilayer perceptrons to obtain uncertainty estimates. Our evaluation with mobile applications datasets on Nvidia Jetson TX2 and Nano shows that our approach not only obtains robust and accurate uncertainty estimations but also outperforms state-of-the-art methods in terms of systems performance, reducing energy consumption (up to 28–folds), keeping the memory overhead at a minimum while still improving accuracy (up to 16%).

神经网络(NNs)极大地提高了移动和嵌入式应用程序的性能，但缺乏“可靠性”估计的措施，无法对其预测进行推理。尽管最先进的不确定性估计技术至关重要，特别是在人类福祉和健康领域，但当应用于资源有限的设备时，计算成本很高。我们提出了一种有效的框架，用于部署在边缘计算平台上的神经网络的预测不确定性估计，无需微调或重新训练策略。为了满足这些系统的能量和延迟需求，该框架从头开始构建，以提供仅基于一次前向传递和可忽略不计的额外矩阵乘法的预测不确定性。我们的目标是使已经训练好的深度学习模型能够在集中于分类任务的推理时间对资源有限的设备产生不确定性估计。该框架建立在理论发展的基础上，将辍学训练作为贝叶斯神经网络的近似推理。我们对卷积层的新颖分层分布近似通过网络级联，在一次运行中提供不确定性估计，确保最小的开销，特别是与不确定性技术相比，不确定性技术需要多次向前传递，能量和延迟要求等线性上升，这使得它们不适合实践。我们证明了它比以前基于多层感知器的工作产生更好的性能和灵活性，以获得不确定性估计。我们对Nvidia Jetson TX2和Nano上的移动应用程序数据集进行的评估表明，我们的方法不仅获得了强大而准确的不确定性估计，而且在系统性能方面优于最先进的方法，降低了能耗(高达28倍)，将内存开销保持在最低限度，同时仍然提高了准确性(高达16%)。

{"title":"The Benefit of the Doubt: Uncertainty Aware Sensing for Edge Computing Platforms","authors":"Lorena Qendro, Jagmohan Chauhan, Alberto Gil C. P. Ramos, C. Mascolo","doi":"10.1145/3453142.3492330","DOIUrl":"https://doi.org/10.1145/3453142.3492330","url":null,"abstract":"Neural networks (NNs) have drastically improved the performance of mobile and embedded applications but lack measures of “reliability” estimation that would enable reasoning over their predictions. Despite the vital importance, especially in areas of human well-being and health, state-of-the-art uncertainty estimation techniques are computationally expensive when applied to resource-constrained devices. We propose an efficient framework for predictive uncertainty estimation in NNs deployed on edge computing platforms with no need for fine-tuning or re-training strategies. To meet the energy and latency requirements of these systems the framework is built from the ground up to provide predictive uncertainty based only on one forward pass and a negligible amount of additional matrix multiplications. Our aim is to enable already trained deep learning models to generate uncertainty estimates on resource-limited devices at inference time focusing on classification tasks. This framework is founded on theoretical developments casting dropout training as approximate inference in Bayesian NNs. Our novel layerwise distribution approximation to the convolution layer cascades through the network, providing uncertainty estimates in one single run which ensures minimal overhead, especially compared with uncertainty techniques that require multiple forwards passes and an equal linear rise in energy and latency requirements making them unsuitable in practice. We demonstrate that it yields better performance and flexibility over previous work based on multilayer perceptrons to obtain uncertainty estimates. Our evaluation with mobile applications datasets on Nvidia Jetson TX2 and Nano shows that our approach not only obtains robust and accurate uncertainty estimations but also outperforms state-of-the-art methods in terms of systems performance, reducing energy consumption (up to 28–folds), keeping the memory overhead at a minimum while still improving accuracy (up to 16%).","PeriodicalId":6779,"journal":{"name":"2021 IEEE/ACM Symposium on Edge Computing (SEC)","volume":"1 1","pages":"214-227"},"PeriodicalIF":0.0,"publicationDate":"2021-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88686659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

AQuA: Analytical Quality Assessment for Optimizing Video Analytics Systems AQuA:优化视频分析系统的分析质量评估

2021 IEEE/ACM Symposium on Edge Computing (SEC)

Pub Date : 2021-01-24 DOI: 10.1145/3453142.3491279

Sibendu Paul, Utsav Drolia, Y. C. Hu, S. Chakradhar

Millions of cameras at edge are being deployed to power a variety of different deep learning applications. However, the frames captured by these cameras are not always pristine - they can be distorted due to lighting issues, sensor noise, compression etc. Such distortions not only deteriorate visual quality, they impact the accuracy of deep learning applications that process such video streams. In this work, we introduce AQuA, to protect application accuracy against such distorted frames by scoring the level of distortion in the frames. It takes into account the analytical quality of frames, not the visual quality, by learning a novel metric, classifier opinion score, and uses a lightweight, CNN-based, object-independent feature extractor. AQuA accurately scores distortion levels of frames and generalizes to multiple different deep learning applications. When used for filtering poor quality frames at edge, it reduces high-confidence errors for analytics applications by 17%. Through filtering, and due to its low overhead (14ms), AQuA can also reduce computation time and average bandwidth usage by 25%.

数以百万计的边缘摄像头被部署在各种不同的深度学习应用程序中。然而，这些相机捕捉到的画面并不总是原始的——它们可能会由于照明问题、传感器噪声、压缩等而失真。这种失真不仅会降低视觉质量，还会影响处理此类视频流的深度学习应用程序的准确性。在这项工作中，我们引入了AQuA，通过对帧中的失真程度进行评分来保护应用程序的准确性。它通过学习一种新的度量，分类器意见评分，考虑帧的分析质量，而不是视觉质量，并使用轻量级的，基于cnn的，与对象无关的特征提取器。AQuA准确地评分帧的失真程度，并推广到多个不同的深度学习应用。当用于过滤边缘的低质量帧时，它将分析应用程序的高置信度错误减少了17%。通过过滤，由于其低开销(14ms)， AQuA还可以减少25%的计算时间和平均带宽使用。

引用次数: 16

MailLeak: Obfuscation-Robust Character Extraction Using Transfer Learning MailLeak:使用迁移学习的模糊鲁棒特征提取

2021 IEEE/ACM Symposium on Edge Computing (SEC)

Pub Date : 2020-12-22 DOI: 10.1145/3453142.3491421

Wei Wang, Emily Sallenback, Zeyu Ning, Hugues Nelson Iradukunda, Wenxing Lu, Qingquan Zhang, Ting Zhu

The obfuscated images on envelopes are believed to be secure and have been widely used to protect the information contained in a mail. In this paper, we present a new algorithm that can conduct character recognition from obfuscated images. Specifically, by using a transfer learning method, we prove that an attacker can effectively recognize the letter without unfolding the envelope. We believe that the presented method reveals the potential threat to current postal services. To defend against the proposed attack, we introduce a context-related shader to prevent such threats from occurring.

信封上的模糊图像被认为是安全的，已被广泛用于保护邮件中的信息。在本文中，我们提出了一种新的算法，可以从混淆图像中进行字符识别。具体来说，通过使用迁移学习方法，我们证明了攻击者可以在不打开信封的情况下有效地识别信件。我们认为，所提出的方法揭示了对当前邮政服务的潜在威胁。为了防御提议的攻击，我们引入了一个与上下文相关的着色器来防止此类威胁的发生。

引用次数: 2

Real-Time Edge Classification: Optimal Offloading under Token Bucket Constraints 实时边缘分类:令牌桶约束下的最优卸载

2021 IEEE/ACM Symposium on Edge Computing (SEC)

Pub Date : 2020-10-26 DOI: 10.1145/3453142.3492329

Ayan Chakrabarti, Roch Guérin, Chenyang Lu, Jiangnan Liu

We consider an edge-computing setting where machine learning-based algorithms are used for real-time classification of inputs acquired by devices, e.g., cameras. Computational resources on the devices are constrained, and therefore only capable of running machine learning models of limited accuracy. A subset of inputs can be offloaded to the edge for processing by a more accurate but resource-intensive machine learning model. Both models process inputs with low-latency, but offloading incurs network delays. To manage these delays and meet application deadlines, a token bucket constrains transmissions from the device. We introduce a Markov Decision Process-based framework to make offload decisions under such constraints. Decisions are based on the local model's confidence and the token bucket state, with the goal of minimizing a specified error measure for the application. We extend the approach to configurations involving multiple devices connected to the same access switch to realize the benefits of a shared token bucket. We evaluate and analyze the policies derived using our framework on the standard ImageNet image classification benchmark.

我们考虑一个边缘计算设置，其中基于机器学习的算法用于设备(例如相机)获取的输入的实时分类。设备上的计算资源受到限制，因此只能运行精度有限的机器学习模型。输入的子集可以卸载到边缘，由更精确但资源密集的机器学习模型进行处理。两种模型都以低延迟处理输入，但卸载会导致网络延迟。为了管理这些延迟并满足应用程序的截止日期，令牌桶限制来自设备的传输。我们引入了一个基于马尔可夫决策过程的框架来在这种约束下进行卸载决策。决策基于本地模型的置信度和令牌桶状态，其目标是最小化应用程序的指定误差度量。我们将该方法扩展到涉及连接到同一接入交换机的多个设备的配置，以实现共享令牌桶的好处。我们在标准ImageNet图像分类基准上评估和分析使用我们的框架派生的策略。

{"title":"Real-Time Edge Classification: Optimal Offloading under Token Bucket Constraints","authors":"Ayan Chakrabarti, Roch Guérin, Chenyang Lu, Jiangnan Liu","doi":"10.1145/3453142.3492329","DOIUrl":"https://doi.org/10.1145/3453142.3492329","url":null,"abstract":"We consider an edge-computing setting where machine learning-based algorithms are used for real-time classification of inputs acquired by devices, e.g., cameras. Computational resources on the devices are constrained, and therefore only capable of running machine learning models of limited accuracy. A subset of inputs can be offloaded to the edge for processing by a more accurate but resource-intensive machine learning model. Both models process inputs with low-latency, but offloading incurs network delays. To manage these delays and meet application deadlines, a token bucket constrains transmissions from the device. We introduce a Markov Decision Process-based framework to make offload decisions under such constraints. Decisions are based on the local model's confidence and the token bucket state, with the goal of minimizing a specified error measure for the application. We extend the approach to configurations involving multiple devices connected to the same access switch to realize the benefits of a shared token bucket. We evaluate and analyze the policies derived using our framework on the standard ImageNet image classification benchmark.","PeriodicalId":6779,"journal":{"name":"2021 IEEE/ACM Symposium on Edge Computing (SEC)","volume":"16 1","pages":"41-54"},"PeriodicalIF":0.0,"publicationDate":"2020-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79307851","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Flame: A Self-Adaptive Auto-Labeling System for Heterogeneous Mobile Processors Flame:异构移动处理器的自适应自动标记系统

2021 IEEE/ACM Symposium on Edge Computing (SEC)

Pub Date : 2020-03-03 DOI: 10.1145/3453142.3493611

Jie Liu, Jiawen Liu, Zhen Xie, Dong Li

How to accurately and efficiently label data on a mobile device is critical for the success of training machine learning models on mobile devices. Auto-labeling data on mobile devices is challenging, because data is incrementally generated and there is a possibility of having unknown labels among new coming data. Furthermore, the rich hardware heterogeneity on mobile devices creates challenges on efficiently executing the auto-labeling workload. In this paper, we introduce Flame, an auto-labeling system that can label dynamically generated data with unknown labels. Flame includes an execution engine that efficiently schedules and executes auto-labeling workloads on heterogeneous mobile processors. Evaluating Flame with six datasets on two mobile devices, we demonstrate that the labeling accuracy of Flame is 11.8%, 16.1%, 18.5%, and 25.2% higher than a state-of-the-art labeling method, transfer learning, semi-supervised learning, and boosting methods respectively. Flame is also energy efficient, it consumes only 328.65mJ and 414.84mJ when labeling 500 data instances on Samsung S9 and Google Pixel2 respectively. Furthermore, running Flame on mobile devices only brings about 0.75 ms additional frame latency which is imperceivable by the users.

如何准确有效地标记移动设备上的数据对于在移动设备上训练机器学习模型的成功至关重要。在移动设备上自动标记数据是具有挑战性的，因为数据是增量生成的，并且在新数据中可能存在未知标签。此外，移动设备上丰富的硬件异质性为有效执行自动标记工作负载带来了挑战。在本文中，我们介绍了一个自动标记系统Flame，它可以用未知的标签标记动态生成的数据。Flame包含一个执行引擎，可以在异构移动处理器上有效地调度和执行自动标记工作负载。在两台移动设备上使用6个数据集对Flame进行评估，结果表明Flame的标注准确率分别比最先进的标注方法、迁移学习、半监督学习和增强方法高11.8%、16.1%、18.5%和25.2%。在三星S9和谷歌Pixel2上标记500个数据实例时，Flame的能耗分别为328.65mJ和414.84mJ。此外，在移动设备上运行Flame只会带来0.75 ms的额外帧延迟，这是用户无法察觉的。

{"title":"Flame: A Self-Adaptive Auto-Labeling System for Heterogeneous Mobile Processors","authors":"Jie Liu, Jiawen Liu, Zhen Xie, Dong Li","doi":"10.1145/3453142.3493611","DOIUrl":"https://doi.org/10.1145/3453142.3493611","url":null,"abstract":"How to accurately and efficiently label data on a mobile device is critical for the success of training machine learning models on mobile devices. Auto-labeling data on mobile devices is challenging, because data is incrementally generated and there is a possibility of having unknown labels among new coming data. Furthermore, the rich hardware heterogeneity on mobile devices creates challenges on efficiently executing the auto-labeling workload. In this paper, we introduce Flame, an auto-labeling system that can label dynamically generated data with unknown labels. Flame includes an execution engine that efficiently schedules and executes auto-labeling workloads on heterogeneous mobile processors. Evaluating Flame with six datasets on two mobile devices, we demonstrate that the labeling accuracy of Flame is 11.8%, 16.1%, 18.5%, and 25.2% higher than a state-of-the-art labeling method, transfer learning, semi-supervised learning, and boosting methods respectively. Flame is also energy efficient, it consumes only 328.65mJ and 414.84mJ when labeling 500 data instances on Samsung S9 and Google Pixel2 respectively. Furthermore, running Flame on mobile devices only brings about 0.75 ms additional frame latency which is imperceivable by the users.","PeriodicalId":6779,"journal":{"name":"2021 IEEE/ACM Symposium on Edge Computing (SEC)","volume":"16 1","pages":"80-93"},"PeriodicalIF":0.0,"publicationDate":"2020-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82669409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4