运动重定向的流导向可转换瓶颈网络

2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Pub Date : 2021-06-01 DOI:10.1109/CVPR46437.2021.01065

Jian Ren, Menglei Chai, Oliver J. Woodford, Kyle Olszewski, S. Tulyakov

{"title":"运动重定向的流导向可转换瓶颈网络","authors":"Jian Ren, Menglei Chai, Oliver J. Woodford, Kyle Olszewski, S. Tulyakov","doi":"10.1109/CVPR46437.2021.01065","DOIUrl":null,"url":null,"abstract":"Human motion retargeting aims to transfer the motion of one person in a \"driving\" video or set of images to another person. Existing efforts leverage a long training video from each target person to train a subject-specific motion transfer model. However, the scalability of such methods is limited, as each model can only generate videos for the given target subject, and such training videos are labor-intensive to acquire and process. Few-shot motion transfer techniques, which only require one or a few images from a target, have recently drawn considerable attention. Methods addressing this task generally use either 2D or explicit 3D representations to transfer motion, and in doing so, sacrifice either accurate geometric modeling or the flexibility of an end-to-end learned representation. Inspired by the Transformable Bottleneck Network, which renders novel views and manipulations of rigid objects, we propose an approach based on an implicit volumetric representation of the image content, which can then be spatially manipulated using volumetric flow fields. We address the challenging question of how to aggregate information across different body poses, learning flow fields that allow for combining content from the appropriate regions of input images of highly non-rigid human subjects performing complex motions into a single implicit volumetric representation. This allows us to learn our 3D representation solely from videos of moving people. Armed with both 3D object understanding and end-to-end learned rendering, this categorically novel representation delivers state-of-the-art image generation quality, as shown by our quantitative and qualitative evaluations.","PeriodicalId":339646,"journal":{"name":"2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"168 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"20","resultStr":"{\"title\":\"Flow Guided Transformable Bottleneck Networks for Motion Retargeting\",\"authors\":\"Jian Ren, Menglei Chai, Oliver J. Woodford, Kyle Olszewski, S. Tulyakov\",\"doi\":\"10.1109/CVPR46437.2021.01065\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Human motion retargeting aims to transfer the motion of one person in a \\\"driving\\\" video or set of images to another person. Existing efforts leverage a long training video from each target person to train a subject-specific motion transfer model. However, the scalability of such methods is limited, as each model can only generate videos for the given target subject, and such training videos are labor-intensive to acquire and process. Few-shot motion transfer techniques, which only require one or a few images from a target, have recently drawn considerable attention. Methods addressing this task generally use either 2D or explicit 3D representations to transfer motion, and in doing so, sacrifice either accurate geometric modeling or the flexibility of an end-to-end learned representation. Inspired by the Transformable Bottleneck Network, which renders novel views and manipulations of rigid objects, we propose an approach based on an implicit volumetric representation of the image content, which can then be spatially manipulated using volumetric flow fields. We address the challenging question of how to aggregate information across different body poses, learning flow fields that allow for combining content from the appropriate regions of input images of highly non-rigid human subjects performing complex motions into a single implicit volumetric representation. This allows us to learn our 3D representation solely from videos of moving people. Armed with both 3D object understanding and end-to-end learned rendering, this categorically novel representation delivers state-of-the-art image generation quality, as shown by our quantitative and qualitative evaluations.\",\"PeriodicalId\":339646,\"journal\":{\"name\":\"2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)\",\"volume\":\"168 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"20\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CVPR46437.2021.01065\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CVPR46437.2021.01065","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 20

摘要

人体动作重定向的目的是将“驾驶”视频或一组图像中的一个人的动作转移到另一个人身上。现有的努力利用来自每个目标人的长训练视频来训练特定主题的动作转移模型。然而，这种方法的可扩展性是有限的，因为每个模型只能为给定的目标主体生成视频，并且这种训练视频的获取和处理是劳动密集型的。少射运动转移技术，它只需要一个或几个图像从一个目标，最近引起了相当大的关注。解决此任务的方法通常使用2D或明确的3D表示来传递运动，这样做会牺牲精确的几何建模或端到端学习表示的灵活性。受变形瓶颈网络的启发，我们提出了一种基于图像内容的隐式体积表示的方法，然后可以使用体积流场对其进行空间操作。我们解决了一个具有挑战性的问题，即如何在不同的身体姿势之间聚合信息，学习流场，允许将执行复杂动作的高度非刚性人类受试者输入图像的适当区域的内容组合到一个隐含的体积表示中。这使我们能够仅从移动的人的视频中学习我们的3D表示。凭借3D对象理解和端到端学习渲染，这种绝对新颖的表示提供了最先进的图像生成质量，如我们的定量和定性评估所示。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Flow Guided Transformable Bottleneck Networks for Motion Retargeting

Human motion retargeting aims to transfer the motion of one person in a "driving" video or set of images to another person. Existing efforts leverage a long training video from each target person to train a subject-specific motion transfer model. However, the scalability of such methods is limited, as each model can only generate videos for the given target subject, and such training videos are labor-intensive to acquire and process. Few-shot motion transfer techniques, which only require one or a few images from a target, have recently drawn considerable attention. Methods addressing this task generally use either 2D or explicit 3D representations to transfer motion, and in doing so, sacrifice either accurate geometric modeling or the flexibility of an end-to-end learned representation. Inspired by the Transformable Bottleneck Network, which renders novel views and manipulations of rigid objects, we propose an approach based on an implicit volumetric representation of the image content, which can then be spatially manipulated using volumetric flow fields. We address the challenging question of how to aggregate information across different body poses, learning flow fields that allow for combining content from the appropriate regions of input images of highly non-rigid human subjects performing complex motions into a single implicit volumetric representation. This allows us to learn our 3D representation solely from videos of moving people. Armed with both 3D object understanding and end-to-end learned rendering, this categorically novel representation delivers state-of-the-art image generation quality, as shown by our quantitative and qualitative evaluations.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助