A View Invariant Human Action Recognition System for Noisy Inputs

2022 19th Conference on Robots and Vision (CRV) Pub Date : 2022-05-01 DOI:10.1109/CRV55824.2022.00017

Joo Wang Kim, J. Hernandez, Richard Cobos, Ricardo Palacios, Andres G. Abad

引用次数: 0

Abstract

We propose a skeleton-based Human Action Recognition (HAR) system, robust to both noisy inputs and perspective variation. This system receives RGB videos as input and consists of three modules: (M1) 2D Key-Points Estimation module, (M2) Robustness module, and (M3) Action Classification module; of which M2 is our main contribution. This module uses pre-trained 3D pose estimator and pose refinement networks to handle noisy information including missing points, and uses rotations of the 3D poses to add robustness to camera view-point variation. To evaluate our approach, we carried out comparison experiments between models trained with M2 and without it. These experiments were conducted on the UESTC view-varying dataset, on the i3DPost multi-view human action dataset and on a Boxing Actions dataset, created by us. Our system achieved positive results, improving the accuracy by 24%, 3% and 11% on each dataset, respectively. On the UESTC dataset, our method achieves the new state of the art for the cross-view evaluation protocols.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于噪声输入的视觉不变人体动作识别系统

我们提出了一种基于骨骼的人体动作识别(HAR)系统，该系统对噪声输入和视角变化都具有鲁棒性。该系统以RGB视频为输入，由三个模块组成:(M1)二维关键点估计模块，(M2)鲁棒性模块，(M3)动作分类模块;其中M2是我们的主要贡献。该模块使用预训练的3D姿态估计器和姿态细化网络来处理包括缺失点在内的噪声信息，并使用3D姿态的旋转来增加相机视点变化的鲁棒性。为了评估我们的方法，我们在使用M2和不使用M2训练的模型之间进行了比较实验。这些实验是在我们创建的UESTC视图变化数据集，i3DPost多视图人体动作数据集和拳击动作数据集上进行的。我们的系统取得了积极的结果，在每个数据集上分别提高了24%，3%和11%的准确率。在UESTC数据集上，我们的方法实现了跨视图评估协议的新状态。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2022 19th Conference on Robots and Vision (CRV)

自引率

0.00%

发文量

期刊最新文献

A View Invariant Human Action Recognition System for Noisy Inputs TemporalNet: Real-time 2D-3D Video Object Detection Occluded Text Detection and Recognition in the Wild Anomaly Detection with Adversarially Learned Perturbations of Latent Space Occlusion-Aware Self-Supervised Stereo Matching with Confidence Guided Raw Disparity Fusion