Deep Metric Learning for Human Action Recognition with SlowFast Networks

2021 International Conference on Visual Communications and Image Processing (VCIP) Pub Date : 2021-12-05 DOI:10.1109/VCIP53242.2021.9675393

Shan-zhi Shi, Cheolkon Jung

引用次数: 1

Abstract

In this paper, we propose deep metric learning for human action recognition with SlowFast networks. We adopt SlowFast Networks to extract slow-changing spatial semantic information of a single target entity in the spatial domain with fast-changing motion information in the temporal domain. Since deep metric learning is able to learn the class difference between human actions, we utilize deep metric learning to learn a mapping from the original video to the compact features in the embedding space. The proposed network consists of three main parts: 1) two branches independently operating at low and high frame rates to extract spatial and temporal features; 2) feature fusion of the two branches; 3) joint training network of deep metric learning and classification loss. Experimental results on the KTH human action dataset demonstrate that the proposed method achieves faster runtime with less model size than C3D and R3D, while ensuring high accuracy.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于慢速网络的深度度量学习人类动作识别

本文提出了基于慢速网络的深度度量学习的人体动作识别方法。我们采用SlowFast Networks来提取单个目标实体在空间域中缓慢变化的空间语义信息和在时间域中快速变化的运动信息。由于深度度量学习能够学习人类行为之间的类别差异，我们利用深度度量学习来学习从原始视频到嵌入空间中紧凑特征的映射。该网络由三个主要部分组成:1)分别以低帧率和高帧率独立工作的两个分支提取时空特征;2)两个分支的特征融合;3)深度度量学习与分类损失联合训练网络。在KTH人体动作数据集上的实验结果表明，与C3D和R3D相比，该方法以更小的模型尺寸实现了更快的运行时间，同时保证了较高的精度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2021 International Conference on Visual Communications and Image Processing (VCIP)

自引率

0.00%

发文量