Empowering lightweight video transformer via the kernel learning

IF 0.7 4区工程技术 Q4 ENGINEERING, ELECTRICAL & ELECTRONIC Electronics Letters Pub Date : 2024-05-10 DOI:10.1049/ell2.13215

Xiaoxi Liu, Ju Liu, Lingchen Gu

{"title":"Empowering lightweight video transformer via the kernel learning","authors":"Xiaoxi Liu, Ju Liu, Lingchen Gu","doi":"10.1049/ell2.13215","DOIUrl":null,"url":null,"abstract":"Video transformers achieve superior performance in video recognition. Despite the recent advances in video transformers, they still require substantial computation and memory resources. To cater for the computation efficiency, a kernel-based video transformer is proposed, including: (1) a new formulation of the video transformer via the kernel learning is presented to better understand the individual components of it; (2) a lightweight Kernel-based spatial–temporal multi-head self-attention block is explored to learn the compact joint spatial–temporal video feature; (3) an adaptive-score position embedding method is conducted to promote the flexibility of video transformer. Experimental results on several action recognition datasets demonstrate the effectiveness of the proposed method. Only pretrained on ImageNet-1K, the method achieves the preferable balance between computation and accuracy, while requiring 7<math>\n <semantics>\n <mo>×</mo>\n <annotation>$\\times$</annotation>\n </semantics></math> fewer parameters and 13<math>\n <semantics>\n <mo>×</mo>\n <annotation>$\\times$</annotation>\n </semantics></math> fewer floating point operations than other comparable methods.","PeriodicalId":11556,"journal":{"name":"Electronics Letters","volume":"60 9","pages":""},"PeriodicalIF":0.7000,"publicationDate":"2024-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/ell2.13215","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Electronics Letters","FirstCategoryId":"5","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1049/ell2.13215","RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

Video transformers achieve superior performance in video recognition. Despite the recent advances in video transformers, they still require substantial computation and memory resources. To cater for the computation efficiency, a kernel-based video transformer is proposed, including: (1) a new formulation of the video transformer via the kernel learning is presented to better understand the individual components of it; (2) a lightweight Kernel-based spatial–temporal multi-head self-attention block is explored to learn the compact joint spatial–temporal video feature; (3) an adaptive-score position embedding method is conducted to promote the flexibility of video transformer. Experimental results on several action recognition datasets demonstrate the effectiveness of the proposed method. Only pretrained on ImageNet-1K, the method achieves the preferable balance between computation and accuracy, while requiring 7 $\times$ fewer parameters and 13 $\times$ fewer floating point operations than other comparable methods.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

通过内核学习增强轻量级视频转换器的能力

视频变换器在视频识别中表现出色。尽管视频变换器近年来取得了长足进步，但仍需要大量的计算和内存资源。为了提高计算效率，本文提出了一种基于内核的视频变换器，包括：（1）通过内核学习对视频变换器进行新的表述，以更好地理解其各个组成部分；（2）探索一种轻量级的基于内核的时空多头自关注块，以学习紧凑的联合时空视频特征；（3）进行自适应分数位置嵌入方法，以提高视频变换器的灵活性。在多个动作识别数据集上的实验结果证明了所提方法的有效性。与其他同类方法相比，该方法只在ImageNet-1K上进行了预训练，就实现了计算量和准确率之间的最佳平衡，同时需要的参数和浮点运算分别比其他同类方法少7 × $/times$和13 × $/times$。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Electronics Letters 工程技术-工程：电子与电气

CiteScore

2.70

自引率

0.00%

发文量

268

审稿时长

3.6 months

期刊介绍： Electronics Letters is an internationally renowned peer-reviewed rapid-communication journal that publishes short original research papers every two weeks. Its broad and interdisciplinary scope covers the latest developments in all electronic engineering related fields including communication, biomedical, optical and device technologies. Electronics Letters also provides further insight into some of the latest developments through special features and interviews. Scope As a journal at the forefront of its field, Electronics Letters publishes papers covering all themes of electronic and electrical engineering. The major themes of the journal are listed below. Antennas and Propagation Biomedical and Bioinspired Technologies, Signal Processing and Applications Control Engineering Electromagnetism: Theory, Materials and Devices Electronic Circuits and Systems Image, Video and Vision Processing and Applications Information, Computing and Communications Instrumentation and Measurement Microwave Technology Optical Communications Photonics and Opto-Electronics Power Electronics, Energy and Sustainability Radar, Sonar and Navigation Semiconductor Technology Signal Processing MIMO