Boosting Self-supervised Video-based Human Action Recognition Through Knowledge Distillation

LatinX in AI at Neural Information Processing Systems Conference 2022 Pub Date : 2022-11-28 DOI:10.52591/lxai202211286

Fernando Camarena, M. González-Mendoza, Leonardo Chang, N. Hernández-Gress

{"title":"Boosting Self-supervised Video-based Human Action Recognition Through Knowledge Distillation","authors":"Fernando Camarena, M. González-Mendoza, Leonardo Chang, N. Hernández-Gress","doi":"10.52591/lxai202211286","DOIUrl":null,"url":null,"abstract":"Deep learning architectures lead the state-of-the-art in several computer vision, natural language processing, and reinforcement learning tasks due to their ability to extract multi-level representations without human engineering. The model’s performance is affected by the amount of labeled data used in training. Hence, novel approaches like self-supervised learning (SSL) extract the supervisory signal using unlabeled data. Although SSL reduces the dependency on human annotations, there are still two main drawbacks. First, high-computational resources are required to train a large-scale model from scratch. Second, knowledge from an SSL model is commonly finetuning to a target model, which forces them to share the same parameters and architecture and make it task-dependent. This paper explores how SSL benefits from knowledge distillation in constructing an efficient and non-task-dependent training framework. The experimental design compared the training process of an SSL algorithm trained from scratch and boosted by knowledge distillation in a teacher-student paradigm using the video-based human action recognition dataset UCF101. Results show that knowledge distillation accelerates the convergence of a network and removes the reliance on model architectures.","PeriodicalId":266286,"journal":{"name":"LatinX in AI at Neural Information Processing Systems Conference 2022","volume":"443 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"LatinX in AI at Neural Information Processing Systems Conference 2022","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.52591/lxai202211286","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Deep learning architectures lead the state-of-the-art in several computer vision, natural language processing, and reinforcement learning tasks due to their ability to extract multi-level representations without human engineering. The model’s performance is affected by the amount of labeled data used in training. Hence, novel approaches like self-supervised learning (SSL) extract the supervisory signal using unlabeled data. Although SSL reduces the dependency on human annotations, there are still two main drawbacks. First, high-computational resources are required to train a large-scale model from scratch. Second, knowledge from an SSL model is commonly finetuning to a target model, which forces them to share the same parameters and architecture and make it task-dependent. This paper explores how SSL benefits from knowledge distillation in constructing an efficient and non-task-dependent training framework. The experimental design compared the training process of an SSL algorithm trained from scratch and boosted by knowledge distillation in a teacher-student paradigm using the video-based human action recognition dataset UCF101. Results show that knowledge distillation accelerates the convergence of a network and removes the reliance on model architectures.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

通过知识升华提高基于自监督视频的人类行为识别

深度学习架构在一些计算机视觉、自然语言处理和强化学习任务中处于领先地位，因为它们能够在没有人工工程的情况下提取多层次表示。模型的性能受到训练中使用的标记数据量的影响。因此，像自监督学习(SSL)这样的新方法使用未标记的数据提取监督信号。尽管SSL减少了对人工注释的依赖，但仍然存在两个主要缺点。首先，从头开始训练大规模模型需要高计算资源。其次，来自SSL模型的知识通常被调优到目标模型，这迫使它们共享相同的参数和体系结构，并使其与任务相关。本文探讨了SSL如何从知识蒸馏中受益，从而构建一个高效的、非任务依赖的训练框架。实验设计使用基于视频的人类动作识别数据集UCF101，比较了师生范式下从头开始训练和知识蒸馏促进的SSL算法的训练过程。结果表明，知识蒸馏加速了网络的收敛，消除了对模型体系结构的依赖。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

LatinX in AI at Neural Information Processing Systems Conference 2022

自引率

0.00%

发文量