Michele Boldo;Mirco De Marchi;Enrico Martini;Stefano Aldegheri;Nicola Bombieri
{"title":"Domain-Adaptive Online Active Learning for Real-Time Intelligent Video Analytics on Edge Devices","authors":"Michele Boldo;Mirco De Marchi;Enrico Martini;Stefano Aldegheri;Nicola Bombieri","doi":"10.1109/TCAD.2024.3453188","DOIUrl":null,"url":null,"abstract":"Deep learning (DL) for intelligent video analytics is increasingly pervasive in various application domains, ranging from Healthcare to Industry 5.0. A significant trend involves deploying DL models on edge devices with limited resources. Techniques, such as pruning, quantization, and early exit, have demonstrated the feasibility of real-time inference at the edge by compressing and optimizing deep neural networks (DNNs). However, adapting pretrained models to new and dynamic scenarios remains a significant challenge. While solutions like domain adaptation, active learning (AL), and teacher-student knowledge distillation (KD) contribute to addressing this challenge, they often rely on cloud or well-equipped computing platforms for fine tuning. In this study, we propose a framework for domain-adaptive online AL of DNN models tailored for intelligent video analytics on resource-constrained devices. Our framework employs a KD approach where both teacher and student models are deployed on the edge device. To determine when to retrain the student DNN model without ground-truth or cloud-based teacher inference, our model utilizes singular value decomposition of input data. It implements the identification of key data frames and efficient retraining of the student through the teacher execution at the edge, aiming to prevent model overfitting. We evaluate the framework through two case studies: 1) human pose estimation and 2) car object detection, both implemented on an NVIDIA Jetson NX device.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"43 11","pages":"4105-4116"},"PeriodicalIF":2.7000,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10745828","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10745828/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0
Abstract
Deep learning (DL) for intelligent video analytics is increasingly pervasive in various application domains, ranging from Healthcare to Industry 5.0. A significant trend involves deploying DL models on edge devices with limited resources. Techniques, such as pruning, quantization, and early exit, have demonstrated the feasibility of real-time inference at the edge by compressing and optimizing deep neural networks (DNNs). However, adapting pretrained models to new and dynamic scenarios remains a significant challenge. While solutions like domain adaptation, active learning (AL), and teacher-student knowledge distillation (KD) contribute to addressing this challenge, they often rely on cloud or well-equipped computing platforms for fine tuning. In this study, we propose a framework for domain-adaptive online AL of DNN models tailored for intelligent video analytics on resource-constrained devices. Our framework employs a KD approach where both teacher and student models are deployed on the edge device. To determine when to retrain the student DNN model without ground-truth or cloud-based teacher inference, our model utilizes singular value decomposition of input data. It implements the identification of key data frames and efficient retraining of the student through the teacher execution at the edge, aiming to prevent model overfitting. We evaluate the framework through two case studies: 1) human pose estimation and 2) car object detection, both implemented on an NVIDIA Jetson NX device.
期刊介绍:
The purpose of this Transactions is to publish papers of interest to individuals in the area of computer-aided design of integrated circuits and systems composed of analog, digital, mixed-signal, optical, or microwave components. The aids include methods, models, algorithms, and man-machine interfaces for system-level, physical and logical design including: planning, synthesis, partitioning, modeling, simulation, layout, verification, testing, hardware-software co-design and documentation of integrated circuit and system designs of all complexities. Design tools and techniques for evaluating and designing integrated circuits and systems for metrics such as performance, power, reliability, testability, and security are a focus.