Joo Wang Kim, J. Hernandez, Richard Cobos, Ricardo Palacios, Andres G. Abad
{"title":"A View Invariant Human Action Recognition System for Noisy Inputs","authors":"Joo Wang Kim, J. Hernandez, Richard Cobos, Ricardo Palacios, Andres G. Abad","doi":"10.1109/CRV55824.2022.00017","DOIUrl":null,"url":null,"abstract":"We propose a skeleton-based Human Action Recognition (HAR) system, robust to both noisy inputs and perspective variation. This system receives RGB videos as input and consists of three modules: (M1) 2D Key-Points Estimation module, (M2) Robustness module, and (M3) Action Classification module; of which M2 is our main contribution. This module uses pre-trained 3D pose estimator and pose refinement networks to handle noisy information including missing points, and uses rotations of the 3D poses to add robustness to camera view-point variation. To evaluate our approach, we carried out comparison experiments between models trained with M2 and without it. These experiments were conducted on the UESTC view-varying dataset, on the i3DPost multi-view human action dataset and on a Boxing Actions dataset, created by us. Our system achieved positive results, improving the accuracy by 24%, 3% and 11% on each dataset, respectively. On the UESTC dataset, our method achieves the new state of the art for the cross-view evaluation protocols.","PeriodicalId":131142,"journal":{"name":"2022 19th Conference on Robots and Vision (CRV)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 19th Conference on Robots and Vision (CRV)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CRV55824.2022.00017","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
We propose a skeleton-based Human Action Recognition (HAR) system, robust to both noisy inputs and perspective variation. This system receives RGB videos as input and consists of three modules: (M1) 2D Key-Points Estimation module, (M2) Robustness module, and (M3) Action Classification module; of which M2 is our main contribution. This module uses pre-trained 3D pose estimator and pose refinement networks to handle noisy information including missing points, and uses rotations of the 3D poses to add robustness to camera view-point variation. To evaluate our approach, we carried out comparison experiments between models trained with M2 and without it. These experiments were conducted on the UESTC view-varying dataset, on the i3DPost multi-view human action dataset and on a Boxing Actions dataset, created by us. Our system achieved positive results, improving the accuracy by 24%, 3% and 11% on each dataset, respectively. On the UESTC dataset, our method achieves the new state of the art for the cross-view evaluation protocols.