{"title":"Two-Stage Representation Refinement Based on Convex Combination for 3-D Human Poses Estimation","authors":"Luefeng Chen;Wei Cao;Biao Zheng;Min Wu;Witold Pedrycz;Kaoru Hirota","doi":"10.1109/TAI.2024.3432028","DOIUrl":null,"url":null,"abstract":"In the human pose estimation task, on the one hand, 3-D pose always has difficulty in dividing different 2-D poses if the view is limited; on the other hand, it is hard to reduce the lifting ambiguity because of the lack of depth information, it is an important and challenging problem. Therefore, two-stage representation refinement based on the convex combination for 3-D human pose estimation is proposed, in which the two-stage method includes a dense-spatial-temporal convolutional network and a local-to-refine network. The former is applied to determine the features between each video frame; the latter is used to get the different scales of pose details. It aims to address the difficulty of estimating 3-D human pose from 2-D image sequences. In such a way, it can better use the relations between every frame in the sequence of the pose video to produce more accurate results. Finally, we combine the above network with a block called convex combination to help refine the 3-D pose location. We test the proposed approach on both Human3.6m and MPII datasets. The result confirms that our method can achieve better performance than improved CNN supervision, a simple yet effective baseline, and coarse-to-fine volumetric prediction. Besides, a robustness test experiment is carried out for the proposed method while the input is interrupted. The result verifies that our method shows better robustness.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"5 12","pages":"6500-6508"},"PeriodicalIF":0.0000,"publicationDate":"2024-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on artificial intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10606307/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In the human pose estimation task, on the one hand, 3-D pose always has difficulty in dividing different 2-D poses if the view is limited; on the other hand, it is hard to reduce the lifting ambiguity because of the lack of depth information, it is an important and challenging problem. Therefore, two-stage representation refinement based on the convex combination for 3-D human pose estimation is proposed, in which the two-stage method includes a dense-spatial-temporal convolutional network and a local-to-refine network. The former is applied to determine the features between each video frame; the latter is used to get the different scales of pose details. It aims to address the difficulty of estimating 3-D human pose from 2-D image sequences. In such a way, it can better use the relations between every frame in the sequence of the pose video to produce more accurate results. Finally, we combine the above network with a block called convex combination to help refine the 3-D pose location. We test the proposed approach on both Human3.6m and MPII datasets. The result confirms that our method can achieve better performance than improved CNN supervision, a simple yet effective baseline, and coarse-to-fine volumetric prediction. Besides, a robustness test experiment is carried out for the proposed method while the input is interrupted. The result verifies that our method shows better robustness.