Skeleton Reconstruction Using Generative Adversarial Networks for Human Activity Recognition Under Occlusion.

IF 3.5 3区综合性期刊 Q2 CHEMISTRY, ANALYTICAL Sensors Pub Date : 2025-03-04 DOI:10.3390/s25051567

Ioannis Vernikos, Evaggelos Spyrou

{"title":"Skeleton Reconstruction Using Generative Adversarial Networks for Human Activity Recognition Under Occlusion.","authors":"Ioannis Vernikos, Evaggelos Spyrou","doi":"10.3390/s25051567","DOIUrl":null,"url":null,"abstract":"<p><p>Recognizing human activities from motion data is a complex task in computer vision, involving the recognition of human behaviors from sequences of 3D motion data. These activities encompass successive body part movements, interactions with objects, or group dynamics. Camera-based recognition methods are cost-effective and perform well under controlled conditions but face challenges in real-world scenarios due to factors such as viewpoint changes, illumination variations, and occlusion. The latter is the most significant challenge in real-world recognition; partial occlusion impacts recognition accuracy to varying degrees depending on the activity and the occluded body parts while complete occlusion can render activity recognition impossible. In this paper, we propose a novel approach for human activity recognition in the presence of partial occlusion, which may be applied in cases wherein up to two body parts are occluded. The proposed approach works under the assumptions that (a) human motion is modeled using a set of 3D skeletal joints, and (b) the same body parts remain occluded throughout the whole activity. Contrary to previous research, in this work, we address this problem using a Generative Adversarial Network (GAN). Specifically, we train a Convolutional Recurrent Neural Network (CRNN), whose goal is to serve as the generator of the GAN. Its aim is to complete the missing parts of the skeleton due to occlusion. Specifically, the input to this CRNN consists of raw 3D skeleton joint positions, upon the removal of joints corresponding to occluded parts. The output of the CRNN is a reconstructed skeleton. For the discriminator of the GAN, we use a simple long short-term memory (LSTM) network. We evaluate the proposed approach using publicly available datasets in a series of occlusion scenarios. We demonstrate that in all scenarios, the occlusion of certain body parts causes a significant decline in performance, although in some cases, the reconstruction process leads to almost perfect recognition. Nonetheless, in almost every circumstance, the herein proposed approach exhibits superior performance compared to previous works, which varies between 2.2% and 37.5%, depending on the dataset used and the occlusion case.</p>","PeriodicalId":21698,"journal":{"name":"Sensors","volume":"25 5","pages":""},"PeriodicalIF":3.5000,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11902716/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Sensors","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.3390/s25051567","RegionNum":3,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CHEMISTRY, ANALYTICAL","Score":null,"Total":0}

引用次数: 0

Abstract

Recognizing human activities from motion data is a complex task in computer vision, involving the recognition of human behaviors from sequences of 3D motion data. These activities encompass successive body part movements, interactions with objects, or group dynamics. Camera-based recognition methods are cost-effective and perform well under controlled conditions but face challenges in real-world scenarios due to factors such as viewpoint changes, illumination variations, and occlusion. The latter is the most significant challenge in real-world recognition; partial occlusion impacts recognition accuracy to varying degrees depending on the activity and the occluded body parts while complete occlusion can render activity recognition impossible. In this paper, we propose a novel approach for human activity recognition in the presence of partial occlusion, which may be applied in cases wherein up to two body parts are occluded. The proposed approach works under the assumptions that (a) human motion is modeled using a set of 3D skeletal joints, and (b) the same body parts remain occluded throughout the whole activity. Contrary to previous research, in this work, we address this problem using a Generative Adversarial Network (GAN). Specifically, we train a Convolutional Recurrent Neural Network (CRNN), whose goal is to serve as the generator of the GAN. Its aim is to complete the missing parts of the skeleton due to occlusion. Specifically, the input to this CRNN consists of raw 3D skeleton joint positions, upon the removal of joints corresponding to occluded parts. The output of the CRNN is a reconstructed skeleton. For the discriminator of the GAN, we use a simple long short-term memory (LSTM) network. We evaluate the proposed approach using publicly available datasets in a series of occlusion scenarios. We demonstrate that in all scenarios, the occlusion of certain body parts causes a significant decline in performance, although in some cases, the reconstruction process leads to almost perfect recognition. Nonetheless, in almost every circumstance, the herein proposed approach exhibits superior performance compared to previous works, which varies between 2.2% and 37.5%, depending on the dataset used and the occlusion case.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

利用生成对抗网络进行骨架重构，实现遮挡条件下的人类活动识别

从运动数据中识别人类活动是计算机视觉中的一项复杂任务，涉及从3D运动数据序列中识别人类行为。这些活动包括连续的身体部位运动，与物体的相互作用，或群体动态。基于相机的识别方法具有成本效益，并且在受控条件下表现良好，但由于视点变化，照明变化和遮挡等因素，在现实场景中面临挑战。后者是现实世界识别中最重大的挑战；部分遮挡会对识别精度产生不同程度的影响，这取决于活动和被遮挡的身体部位，而完全遮挡会使活动无法识别。在本文中，我们提出了一种在部分遮挡的情况下进行人体活动识别的新方法，该方法可以应用于多达两个身体部位被遮挡的情况。提出的方法在以下假设下工作：(a)人体运动使用一组3D骨骼关节建模，(b)在整个活动中相同的身体部位保持闭塞。与之前的研究相反，在这项工作中，我们使用生成对抗网络（GAN）解决了这个问题。具体来说，我们训练了一个卷积递归神经网络（CRNN），其目标是作为GAN的生成器。它的目的是完成由于遮挡而缺失的骨骼部分。具体来说，该CRNN的输入由原始的3D骨架关节位置组成，移除被遮挡部分对应的关节。CRNN的输出是一个重构的骨架。对于GAN的鉴别器，我们使用了一个简单的长短期记忆（LSTM）网络。我们在一系列遮挡场景中使用公开可用的数据集来评估所提出的方法。我们证明，在所有情况下，某些身体部位的遮挡会导致性能显著下降，尽管在某些情况下，重建过程会导致几乎完美的识别。尽管如此，在几乎所有情况下，本文提出的方法与以前的工作相比都表现出更好的性能，根据使用的数据集和遮挡情况，其差异在2.2%到37.5%之间。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Sensors 工程技术-电化学

CiteScore

7.30

自引率

12.80%

发文量

8430

审稿时长

1.7 months

期刊介绍： Sensors (ISSN 1424-8220) provides an advanced forum for the science and technology of sensors and biosensors. It publishes reviews (including comprehensive reviews on the complete sensors products), regular research papers and short notes. Our aim is to encourage scientists to publish their experimental and theoretical results in as much detail as possible. There is no restriction on the length of the papers. The full experimental details must be provided so that the results can be reproduced.