Frame-Level Event Representation Learning for Semantic-Level Generation and Editing of Avatar Motion

Ayaka Ideno, Takuhiro Kaneko, Tatsuya Harada
{"title":"Frame-Level Event Representation Learning for Semantic-Level Generation and Editing of Avatar Motion","authors":"Ayaka Ideno, Takuhiro Kaneko, Tatsuya Harada","doi":"10.1145/3577190.3614175","DOIUrl":null,"url":null,"abstract":"Understanding an avatar’s motion and controlling its content is important for content creation and has been actively studied in computer vision and graphics. An avatar’s motion consists of frames representing poses each time, and a subsequence of frames can be grouped into a segment based on semantic meaning. To enable semantic-level control of motion, it is important to understand the semantic division of the avatar’s motion. We define a semantic division of avatar’s motion as an “event”, which switches only when the frame in the motion cannot be predicted from the previous frames and information of the last event, and tackled editing motion and inferring motion from text based on events. However, it is challenging because we need to obtain the event information, and control the content of motion based on the obtained event information. To overcome this challenge, we propose obtaining frame-level event representation from the pair of motion and text and using it to edit events in motion and predict motion from the text. Specifically, we learn a frame-level event representation by reconstructing the avatar’s motion from the corresponding frame-level event representation sequence while inferring the sequence from the text. By doing so, we can predict motion from the text. Also, since the event at each motion frame is represented with the corresponding event representation, we can edit events in motion by editing the corresponding event representation sequence. We evaluated our method on the HumanML3D dataset and demonstrated that our model can generate motion from the text while editing motion flexibly (e.g., allowing the change of the event duration, modification of the event characteristics, and the addition of new events).","PeriodicalId":93171,"journal":{"name":"Companion Publication of the 2020 International Conference on Multimodal Interaction","volume":"105 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Companion Publication of the 2020 International Conference on Multimodal Interaction","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3577190.3614175","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Understanding an avatar’s motion and controlling its content is important for content creation and has been actively studied in computer vision and graphics. An avatar’s motion consists of frames representing poses each time, and a subsequence of frames can be grouped into a segment based on semantic meaning. To enable semantic-level control of motion, it is important to understand the semantic division of the avatar’s motion. We define a semantic division of avatar’s motion as an “event”, which switches only when the frame in the motion cannot be predicted from the previous frames and information of the last event, and tackled editing motion and inferring motion from text based on events. However, it is challenging because we need to obtain the event information, and control the content of motion based on the obtained event information. To overcome this challenge, we propose obtaining frame-level event representation from the pair of motion and text and using it to edit events in motion and predict motion from the text. Specifically, we learn a frame-level event representation by reconstructing the avatar’s motion from the corresponding frame-level event representation sequence while inferring the sequence from the text. By doing so, we can predict motion from the text. Also, since the event at each motion frame is represented with the corresponding event representation, we can edit events in motion by editing the corresponding event representation sequence. We evaluated our method on the HumanML3D dataset and demonstrated that our model can generate motion from the text while editing motion flexibly (e.g., allowing the change of the event duration, modification of the event characteristics, and the addition of new events).
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
面向虚拟角色运动语义级生成与编辑的帧级事件表示学习
理解角色的动作和控制其内容对于内容创作非常重要,并且在计算机视觉和图形学中得到了积极的研究。角色的动作由代表每次姿势的帧组成,而随后的帧序列可以根据语义组合成一个片段。为了实现语义层面的运动控制,理解角色运动的语义划分是很重要的。我们将角色运动的语义划分定义为一个“事件”,只有当动作中的帧不能从前一帧和最后一个事件的信息中预测时,它才会切换,并解决了基于事件的编辑动作和从文本中推断动作的问题。但是,我们需要获取事件信息,并根据获得的事件信息来控制运动的内容,这是一个具有挑战性的问题。为了克服这一挑战,我们提出从运动和文本对中获得帧级事件表示,并使用它来编辑运动中的事件和从文本中预测运动。具体来说,我们通过从相应的帧级事件表示序列重构角色的运动来学习帧级事件表示,同时从文本中推断序列。通过这样做,我们可以从文本中预测运动。此外,由于每个运动帧中的事件都用相应的事件表示表示,因此我们可以通过编辑相应的事件表示序列来编辑运动中的事件。我们在HumanML3D数据集上评估了我们的方法,并证明了我们的模型可以在灵活编辑运动的同时从文本中生成运动(例如,允许改变事件持续时间、修改事件特征和添加新事件)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Gesture Motion Graphs for Few-Shot Speech-Driven Gesture Reenactment The UEA Digital Humans entry to the GENEA Challenge 2023 Deciphering Entrepreneurial Pitches: A Multimodal Deep Learning Approach to Predict Probability of Investment The FineMotion entry to the GENEA Challenge 2023: DeepPhase for conversational gestures generation FEIN-Z: Autoregressive Behavior Cloning for Speech-Driven Gesture Generation
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1