PoseFlow:用于理解视频中人类行为的深度运动表示

Dingwen Zhang, Guangyu Guo, Dong Huang, Junwei Han
{"title":"PoseFlow:用于理解视频中人类行为的深度运动表示","authors":"Dingwen Zhang, Guangyu Guo, Dong Huang, Junwei Han","doi":"10.1109/CVPR.2018.00707","DOIUrl":null,"url":null,"abstract":"Motion of the human body is the critical cue for understanding and characterizing human behavior in videos. Most existing approaches explore the motion cue using optical flows. However, optical flow usually contains motion on both the interested human bodies and the undesired background. This \"noisy\" motion representation makes it very challenging for pose estimation and action recognition in real scenarios. To address this issue, this paper presents a novel deep motion representation, called PoseFlow, which reveals human motion in videos while suppressing background and motion blur, and being robust to occlusion. For learning PoseFlow with mild computational cost, we propose a functionally structured spatial-temporal deep network, PoseFlow Net (PFN), to jointly solve the skeleton localization and matching problems of PoseFlow. Comprehensive experiments show that PFN outperforms the state-of-the-art deep flow estimation models in generating PoseFlow. Moreover, PoseFlow demonstrates its potential on improving two challenging tasks in human video analysis: pose estimation and action recognition.","PeriodicalId":6564,"journal":{"name":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","volume":"13 1","pages":"6762-6770"},"PeriodicalIF":0.0000,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"31","resultStr":"{\"title\":\"PoseFlow: A Deep Motion Representation for Understanding Human Behaviors in Videos\",\"authors\":\"Dingwen Zhang, Guangyu Guo, Dong Huang, Junwei Han\",\"doi\":\"10.1109/CVPR.2018.00707\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Motion of the human body is the critical cue for understanding and characterizing human behavior in videos. Most existing approaches explore the motion cue using optical flows. However, optical flow usually contains motion on both the interested human bodies and the undesired background. This \\\"noisy\\\" motion representation makes it very challenging for pose estimation and action recognition in real scenarios. To address this issue, this paper presents a novel deep motion representation, called PoseFlow, which reveals human motion in videos while suppressing background and motion blur, and being robust to occlusion. For learning PoseFlow with mild computational cost, we propose a functionally structured spatial-temporal deep network, PoseFlow Net (PFN), to jointly solve the skeleton localization and matching problems of PoseFlow. Comprehensive experiments show that PFN outperforms the state-of-the-art deep flow estimation models in generating PoseFlow. Moreover, PoseFlow demonstrates its potential on improving two challenging tasks in human video analysis: pose estimation and action recognition.\",\"PeriodicalId\":6564,\"journal\":{\"name\":\"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition\",\"volume\":\"13 1\",\"pages\":\"6762-6770\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"31\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CVPR.2018.00707\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CVPR.2018.00707","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 31

摘要

在视频中,人体的运动是理解和刻画人类行为的关键线索。大多数现有的方法使用光流来探索运动线索。然而,光流通常包含感兴趣的人体和不希望的背景上的运动。这种“嘈杂”的运动表示使得真实场景中的姿态估计和动作识别非常具有挑战性。为了解决这个问题,本文提出了一种新的深度运动表示,称为PoseFlow,它在抑制背景和运动模糊的同时显示视频中的人体运动,并且对遮挡具有鲁棒性。为了以较小的计算成本学习PoseFlow,我们提出了一种功能结构化的时空深度网络——PoseFlow Net (PFN),共同解决PoseFlow的骨架定位和匹配问题。综合实验表明,PFN在生成PoseFlow方面优于最先进的深流估计模型。此外,PoseFlow展示了其在改进人类视频分析中两个具有挑战性的任务方面的潜力:姿势估计和动作识别。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
PoseFlow: A Deep Motion Representation for Understanding Human Behaviors in Videos
Motion of the human body is the critical cue for understanding and characterizing human behavior in videos. Most existing approaches explore the motion cue using optical flows. However, optical flow usually contains motion on both the interested human bodies and the undesired background. This "noisy" motion representation makes it very challenging for pose estimation and action recognition in real scenarios. To address this issue, this paper presents a novel deep motion representation, called PoseFlow, which reveals human motion in videos while suppressing background and motion blur, and being robust to occlusion. For learning PoseFlow with mild computational cost, we propose a functionally structured spatial-temporal deep network, PoseFlow Net (PFN), to jointly solve the skeleton localization and matching problems of PoseFlow. Comprehensive experiments show that PFN outperforms the state-of-the-art deep flow estimation models in generating PoseFlow. Moreover, PoseFlow demonstrates its potential on improving two challenging tasks in human video analysis: pose estimation and action recognition.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Multistage Adversarial Losses for Pose-Based Human Image Synthesis Document Enhancement Using Visibility Detection Demo2Vec: Reasoning Object Affordances from Online Videos Planar Shape Detection at Structural Scales Where and Why are They Looking? Jointly Inferring Human Attention and Intentions in Complex Tasks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1