On Boosting Single-Frame 3D Human Pose Estimation via Monocular Videos

Zhi Li, Xuan Wang, Fei Wang, Peilin Jiang
{"title":"On Boosting Single-Frame 3D Human Pose Estimation via Monocular Videos","authors":"Zhi Li, Xuan Wang, Fei Wang, Peilin Jiang","doi":"10.1109/ICCV.2019.00228","DOIUrl":null,"url":null,"abstract":"The premise of training an accurate 3D human pose estimation network is the possession of huge amount of richly annotated training data. Nonetheless, manually obtaining rich and accurate annotations is, even not impossible, tedious and slow. In this paper, we propose to exploit monocular videos to complement the training dataset for the single-image 3D human pose estimation tasks. At the beginning, a baseline model is trained with a small set of annotations. By fixing some reliable estimations produced by the resulting model, our method automatically collects the annotations across the entire video as solving the 3D trajectory completion problem. Then, the baseline model is further trained with the collected annotations to learn the new poses. We evaluate our method on the broadly-adopted Human3.6M and MPI-INF-3DHP datasets. As illustrated in experiments, given only a small set of annotations, our method successfully makes the model to learn new poses from unlabelled monocular videos, promoting the accuracies of the baseline model by about 10%. By contrast with previous approaches, our method does not rely on either multi-view imagery or any explicit 2D keypoint annotations.","PeriodicalId":6728,"journal":{"name":"2019 IEEE/CVF International Conference on Computer Vision (ICCV)","volume":"45 1","pages":"2192-2201"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"34","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE/CVF International Conference on Computer Vision (ICCV)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCV.2019.00228","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 34

Abstract

The premise of training an accurate 3D human pose estimation network is the possession of huge amount of richly annotated training data. Nonetheless, manually obtaining rich and accurate annotations is, even not impossible, tedious and slow. In this paper, we propose to exploit monocular videos to complement the training dataset for the single-image 3D human pose estimation tasks. At the beginning, a baseline model is trained with a small set of annotations. By fixing some reliable estimations produced by the resulting model, our method automatically collects the annotations across the entire video as solving the 3D trajectory completion problem. Then, the baseline model is further trained with the collected annotations to learn the new poses. We evaluate our method on the broadly-adopted Human3.6M and MPI-INF-3DHP datasets. As illustrated in experiments, given only a small set of annotations, our method successfully makes the model to learn new poses from unlabelled monocular videos, promoting the accuracies of the baseline model by about 10%. By contrast with previous approaches, our method does not rely on either multi-view imagery or any explicit 2D keypoint annotations.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用单目视频增强单帧3D人体姿态估计
训练出一个准确的三维人体姿态估计网络的前提是拥有大量注释丰富的训练数据。尽管如此,手动获取丰富而准确的注释,即使不是不可能的,也是冗长而缓慢的。在本文中,我们提出利用单目视频来补充训练数据集,用于单图像3D人体姿态估计任务。一开始,基线模型是用一小组注释进行训练的。通过修复一些由结果模型产生的可靠估计,我们的方法自动收集整个视频中的注释,以解决3D轨迹完成问题。然后,使用收集到的注释对基线模型进行进一步训练,以学习新的姿态。我们在广泛采用的Human3.6M和MPI-INF-3DHP数据集上评估了我们的方法。实验表明,仅在少量注释的情况下,我们的方法就成功地使模型从未标记的单目视频中学习新的姿势,将基线模型的准确率提高了10%左右。与之前的方法相比,我们的方法既不依赖于多视图图像,也不依赖于任何显式的2D关键点注释。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Very Long Natural Scenery Image Prediction by Outpainting VTNFP: An Image-Based Virtual Try-On Network With Body and Clothing Feature Preservation Towards Latent Attribute Discovery From Triplet Similarities Gaze360: Physically Unconstrained Gaze Estimation in the Wild Attention Bridging Network for Knowledge Transfer
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1