Animation of hand-drawn sketches is an adorable art. It allows the animator to generate animations with expressive freedom and requires significant expertise. In this work, we introduce a novel sketch animation framework designed to address inherent challenges, such as motion extraction, motion transfer, and occlusion. The framework takes an exemplar video input featuring a moving object and utilizes a robust motion transfer technique to animate the input sketch. We show comparative evaluations that demonstrate the superior performance of our method over existing sketch animation techniques. Notably, our approach exhibits a higher level of user accessibility in contrast to conventional sketch-based animation systems, positioning it as a promising contributor to the field of sketch animation. https://graphics-research-group.github.io/SketchAnim/
{"title":"SketchAnim: Real-time sketch animation transfer from videos","authors":"Gaurav Rai, Shreyas Gupta, Ojaswa Sharma","doi":"10.1111/cgf.15176","DOIUrl":"https://doi.org/10.1111/cgf.15176","url":null,"abstract":"<p>Animation of hand-drawn sketches is an adorable art. It allows the animator to generate animations with expressive freedom and requires significant expertise. In this work, we introduce a novel sketch animation framework designed to address inherent challenges, such as motion extraction, motion transfer, and occlusion. The framework takes an exemplar video input featuring a moving object and utilizes a robust motion transfer technique to animate the input sketch. We show comparative evaluations that demonstrate the superior performance of our method over existing sketch animation techniques. Notably, our approach exhibits a higher level of user accessibility in contrast to conventional sketch-based animation systems, positioning it as a promising contributor to the field of sketch animation. https://graphics-research-group.github.io/SketchAnim/</p>","PeriodicalId":10687,"journal":{"name":"Computer Graphics Forum","volume":"43 8","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142707470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Learning-based methods for 3D content generation have shown great potential to create 3D characters from text prompts, videos, and images. However, current methods primarily focus on generating static 3D meshes, overlooking the crucial aspect of creating an animatable 3D meshes. Directly using 3D meshes generated by existing methods to create underlying skeletons for animation presents many challenges because the generated mesh might exhibit geometry artifacts or assume arbitrary poses that complicate the subsequent rigging process. This work proposes a new framework for generating a 3D animatable mesh from a single 2D image depicting the character. We do so by enforcing the generated 3D mesh to assume an A-pose, which can mitigate the geometry artifacts and facilitate the use of existing automatic rigging methods. Our approach aims to leverage the generative power of existing models across modalities without the need for new data or large-scale training. We evaluate the effectiveness of our framework with qualitative results, as well as ablation studies and quantitative comparisons with existing 3D mesh generation models.
基于学习的三维内容生成方法在根据文本提示、视频和图像创建三维角色方面显示出巨大的潜力。然而,目前的方法主要侧重于生成静态三维网格,忽略了创建可动画化三维网格这一关键环节。直接使用现有方法生成的三维网格来创建动画底层骨架会面临许多挑战,因为生成的网格可能会出现几何假象或任意姿势,从而使后续的装配过程复杂化。本作品提出了一种新的框架,用于从描绘角色的单张二维图像生成三维动画网格。我们的方法是强制生成的三维网格采用 A 姿态,这样可以减少几何假象,方便使用现有的自动装配方法。我们的方法旨在利用现有跨模态模型的生成能力,而无需新数据或大规模训练。我们通过定性结果、消融研究以及与现有三维网格生成模型的定量比较来评估我们框架的有效性。
{"title":"Creating a 3D Mesh in A-pose from a Single Image for Character Rigging","authors":"Seunghwan Lee, C. Karen Liu","doi":"10.1111/cgf.15177","DOIUrl":"https://doi.org/10.1111/cgf.15177","url":null,"abstract":"<p>Learning-based methods for 3D content generation have shown great potential to create 3D characters from text prompts, videos, and images. However, current methods primarily focus on generating static 3D meshes, overlooking the crucial aspect of creating an animatable 3D meshes. Directly using 3D meshes generated by existing methods to create underlying skeletons for animation presents many challenges because the generated mesh might exhibit geometry artifacts or assume arbitrary poses that complicate the subsequent rigging process. This work proposes a new framework for generating a 3D animatable mesh from a single 2D image depicting the character. We do so by enforcing the generated 3D mesh to assume an A-pose, which can mitigate the geometry artifacts and facilitate the use of existing automatic rigging methods. Our approach aims to leverage the generative power of existing models across modalities without the need for new data or large-scale training. We evaluate the effectiveness of our framework with qualitative results, as well as ablation studies and quantitative comparisons with existing 3D mesh generation models.</p>","PeriodicalId":10687,"journal":{"name":"Computer Graphics Forum","volume":"43 8","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142707471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
D. Durst, F. Xie, V. Sarukkai, B. Shacklett, I. Frosio, C. Tessler, J. Kim, C. Taylor, G. Bernstein, S. Choudhury, P. Hanrahan, K. Fatahalian
In multiplayer, first-person shooter games like Counter-Strike: Global Offensive (CS:GO), coordinated movement is a critical component of high-level strategic play. However, the complexity of team coordination and the variety of conditions present in popular game maps make it impractical to author hand-crafted movement policies for every scenario. We show that it is possible to take a data-driven approach to creating human-like movement controllers for CS:GO. We curate a team movement dataset comprising 123 hours of professional game play traces, and use this dataset to train a transformer-based movement model that generates human-like team movement for all players in a “Retakes” round of the game. Importantly, the movement prediction model is efficient. Performing inference for all players takes less than 0.5 ms per game step (amortized cost) on a single CPU core, making it plausible for use in commercial games today. Human evaluators assess that our model behaves more like humans than both commercially-available bots and procedural movement controllers scripted by experts (16% to 59% higher by TrueSkill rating of “human-like”). Using experiments involving in-game bot vs. bot self-play, we demonstrate that our model performs simple forms of teamwork, makes fewer common movement mistakes, and yields movement distributions, player lifetimes, and kill locations similar to those observed in professional CS:GO match play.
{"title":"Learning to Move Like Professional Counter-Strike Players","authors":"D. Durst, F. Xie, V. Sarukkai, B. Shacklett, I. Frosio, C. Tessler, J. Kim, C. Taylor, G. Bernstein, S. Choudhury, P. Hanrahan, K. Fatahalian","doi":"10.1111/cgf.15173","DOIUrl":"https://doi.org/10.1111/cgf.15173","url":null,"abstract":"<p>In multiplayer, first-person shooter games like Counter-Strike: Global Offensive (CS:GO), coordinated movement is a critical component of high-level strategic play. However, the complexity of team coordination and the variety of conditions present in popular game maps make it impractical to author hand-crafted movement policies for every scenario. We show that it is possible to take a data-driven approach to creating human-like movement controllers for CS:GO. We curate a team movement dataset comprising 123 hours of professional game play traces, and use this dataset to train a transformer-based movement model that generates human-like team movement for all players in a “Retakes” round of the game. Importantly, the movement prediction model is efficient. Performing inference for all players takes less than 0.5 ms per game step (amortized cost) on a single CPU core, making it plausible for use in commercial games today. Human evaluators assess that our model behaves more like humans than both commercially-available bots and procedural movement controllers scripted by experts (16% to 59% higher by TrueSkill rating of “human-like”). Using experiments involving in-game bot vs. bot self-play, we demonstrate that our model performs simple forms of teamwork, makes fewer common movement mistakes, and yields movement distributions, player lifetimes, and kill locations similar to those observed in professional CS:GO match play.</p>","PeriodicalId":10687,"journal":{"name":"Computer Graphics Forum","volume":"43 8","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142707497","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Animating gaze behavior is crucial for creating believable virtual characters, providing insights into their perception and interaction with the environment. In this paper, we present an efficient yet natural-looking gaze animation model applicable to real-time walking characters exploring natural environments. We address the challenge of dynamic gaze adaptation by combining findings from neuroscience with a data-driven saliency model. Specifically, our model determines gaze focus by considering the character's locomotion, environment stimuli, and terrain conditions. Our model is compatible with both automatic navigation through pre-defined character trajectories and user-guided interactive locomotion, and can be configured according to the desired degree of visual exploration of the environment. Our perceptual evaluation shows that our solution significantly improves the state-of-the-art saliency-based gaze animation with respect to the character's apparent awareness of the environment, the naturalness of the motion, and the elements to which it pays attention.
{"title":"Reactive Gaze during Locomotion in Natural Environments","authors":"J. K. Melgare, D. Rohmer, S. R. Musse, M-P. Cani","doi":"10.1111/cgf.15168","DOIUrl":"https://doi.org/10.1111/cgf.15168","url":null,"abstract":"<p>Animating gaze behavior is crucial for creating believable virtual characters, providing insights into their perception and interaction with the environment. In this paper, we present an efficient yet natural-looking gaze animation model applicable to real-time walking characters exploring natural environments. We address the challenge of dynamic gaze adaptation by combining findings from neuroscience with a data-driven saliency model. Specifically, our model determines gaze focus by considering the character's locomotion, environment stimuli, and terrain conditions. Our model is compatible with both automatic navigation through pre-defined character trajectories and user-guided interactive locomotion, and can be configured according to the desired degree of visual exploration of the environment. Our perceptual evaluation shows that our solution significantly improves the state-of-the-art saliency-based gaze animation with respect to the character's apparent awareness of the environment, the naturalness of the motion, and the elements to which it pays attention.</p>","PeriodicalId":10687,"journal":{"name":"Computer Graphics Forum","volume":"43 8","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142707493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Agon Serifi, Ruben Grandia, Espen Knoop, Markus Gross, Moritz Bächer
Recent progress in physics-based character control has made it possible to learn policies from unstructured motion data. However, it remains challenging to train a single control policy that works with diverse and unseen motions, and can be deployed to real-world physical robots. In this paper, we propose a two-stage technique that enables the control of a character with a full-body kinematic motion reference, with a focus on imitation accuracy. In a first stage, we extract a latent space encoding by training a variational autoencoder, taking short windows of motion from unstructured data as input. We then use the embedding from the time-varying latent code to train a conditional policy in a second stage, providing a mapping from kinematic input to dynamics-aware output. By keeping the two stages separate, we benefit from self-supervised methods to get better latent codes and explicit imitation rewards to avoid mode collapse. We demonstrate the efficiency and robustness of our method in simulation, with unseen user-specified motions, and on a bipedal robot, where we bring dynamic motions to the real world.
{"title":"VMP: Versatile Motion Priors for Robustly Tracking Motion on Physical Characters","authors":"Agon Serifi, Ruben Grandia, Espen Knoop, Markus Gross, Moritz Bächer","doi":"10.1111/cgf.15175","DOIUrl":"https://doi.org/10.1111/cgf.15175","url":null,"abstract":"<p>Recent progress in physics-based character control has made it possible to learn policies from unstructured motion data. However, it remains challenging to train a single control policy that works with diverse and unseen motions, and can be deployed to real-world physical robots. In this paper, we propose a two-stage technique that enables the control of a character with a full-body kinematic motion reference, with a focus on imitation accuracy. In a first stage, we extract a latent space encoding by training a variational autoencoder, taking short windows of motion from unstructured data as input. We then use the embedding from the time-varying latent code to train a conditional policy in a second stage, providing a mapping from kinematic input to dynamics-aware output. By keeping the two stages separate, we benefit from self-supervised methods to get better latent codes and explicit imitation rewards to avoid mode collapse. We demonstrate the efficiency and robustness of our method in simulation, with unseen user-specified motions, and on a bipedal robot, where we bring dynamic motions to the real world.</p>","PeriodicalId":10687,"journal":{"name":"Computer Graphics Forum","volume":"43 8","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142707499","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Generating high-fidelity garment animations through traditional workflows, from modeling to rendering, is both tedious and expensive. These workflows often require repetitive steps in response to updates in character motion, rendering viewpoint changes, or appearance edits. Although recent neural rendering offers an efficient solution for computationally intensive processes, it struggles with rendering complex garment animations containing fine wrinkle details and realistic garment-and-body occlusions, while maintaining structural consistency across frames and dense view rendering. In this paper, we propose a novel approach to directly synthesize garment animations from body motion sequences without the need for an explicit garment proxy. Our approach infers garment dynamic features from body motion, providing a preliminary overview of garment structure. Simultaneously, we capture detailed features from synthesized reference images of the garment's front and back, generated by a pre-trained image model. These features are then used to construct a neural radiance field that renders the garment animation video. Additionally, our technique enables garment recoloring by decomposing its visual elements. We demonstrate the generalizability of our method across unseen body motions and camera views, ensuring detailed structural consistency. Furthermore, we showcase its applicability to color editing on both real and synthetic garment data. Compared to existing neural rendering techniques, our method exhibits qualitative and quantitative improvements in garment dynamics and wrinkle detail modeling. Code is available at https://github.com/wrk226/GarmentAnimationNeRF.
{"title":"Garment Animation NeRF with Color Editing","authors":"Renke Wang, Meng Zhang, Jun Li, Jian Yang","doi":"10.1111/cgf.15178","DOIUrl":"https://doi.org/10.1111/cgf.15178","url":null,"abstract":"<p>Generating high-fidelity garment animations through traditional workflows, from modeling to rendering, is both tedious and expensive. These workflows often require repetitive steps in response to updates in character motion, rendering viewpoint changes, or appearance edits. Although recent neural rendering offers an efficient solution for computationally intensive processes, it struggles with rendering complex garment animations containing fine wrinkle details and realistic garment-and-body occlusions, while maintaining structural consistency across frames and dense view rendering. In this paper, we propose a novel approach to directly synthesize garment animations from body motion sequences without the need for an explicit garment proxy. Our approach infers garment dynamic features from body motion, providing a preliminary overview of garment structure. Simultaneously, we capture detailed features from synthesized reference images of the garment's front and back, generated by a pre-trained image model. These features are then used to construct a neural radiance field that renders the garment animation video. Additionally, our technique enables garment recoloring by decomposing its visual elements. We demonstrate the generalizability of our method across unseen body motions and camera views, ensuring detailed structural consistency. Furthermore, we showcase its applicability to color editing on both real and synthetic garment data. Compared to existing neural rendering techniques, our method exhibits qualitative and quantitative improvements in garment dynamics and wrinkle detail modeling. Code is available at https://github.com/wrk226/GarmentAnimationNeRF.</p>","PeriodicalId":10687,"journal":{"name":"Computer Graphics Forum","volume":"43 8","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142707473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}