使用记忆网络作为动态字典从语音中生成不同的手势

Zeyu Zhao, Nan Gao, Zhi Zeng, Shuwu Zhanga
{"title":"使用记忆网络作为动态字典从语音中生成不同的手势","authors":"Zeyu Zhao, Nan Gao, Zhi Zeng, Shuwu Zhanga","doi":"10.1109/CoST57098.2022.00042","DOIUrl":null,"url":null,"abstract":"People naturally enhance their speeches with body motion or gestures. Generating human gestures for digital humans or virtual avatars from speech audio or text remains challenging for its indeterministic nature. We observe that existing neural methods often give gestures with an inadequate amount of movement shift, which can be characterized as slow or dull. Thus, we propose a novel generative model coupled with memory networks to work as dynamic dictionaries for generating gestures with improved diversity. Under the hood of the proposed model, a dictionary network dynamically stores previously appeared pose features corresponding to text features for the generator to lookup, while a pose generation network takes in audio and pose features and outputs the resulting gesture sequences. Seed poses are utilized in the generation process to guarantee the continuity between two speech segments. We also propose a new objective evaluation metric for diversity of generated gestures and succeed in demonstrating that the proposed model has the ability to generate gestures with improved diversity.","PeriodicalId":135595,"journal":{"name":"2022 International Conference on Culture-Oriented Science and Technology (CoST)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Generating Diverse Gestures from Speech Using Memory Networks as Dynamic Dictionaries\",\"authors\":\"Zeyu Zhao, Nan Gao, Zhi Zeng, Shuwu Zhanga\",\"doi\":\"10.1109/CoST57098.2022.00042\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"People naturally enhance their speeches with body motion or gestures. Generating human gestures for digital humans or virtual avatars from speech audio or text remains challenging for its indeterministic nature. We observe that existing neural methods often give gestures with an inadequate amount of movement shift, which can be characterized as slow or dull. Thus, we propose a novel generative model coupled with memory networks to work as dynamic dictionaries for generating gestures with improved diversity. Under the hood of the proposed model, a dictionary network dynamically stores previously appeared pose features corresponding to text features for the generator to lookup, while a pose generation network takes in audio and pose features and outputs the resulting gesture sequences. Seed poses are utilized in the generation process to guarantee the continuity between two speech segments. We also propose a new objective evaluation metric for diversity of generated gestures and succeed in demonstrating that the proposed model has the ability to generate gestures with improved diversity.\",\"PeriodicalId\":135595,\"journal\":{\"name\":\"2022 International Conference on Culture-Oriented Science and Technology (CoST)\",\"volume\":\"15 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 International Conference on Culture-Oriented Science and Technology (CoST)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CoST57098.2022.00042\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Conference on Culture-Oriented Science and Technology (CoST)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CoST57098.2022.00042","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

人们自然会用肢体动作或手势来加强他们的演讲。由于语音、音频或文本的不确定性,为数字人类或虚拟化身生成人类手势仍然具有挑战性。我们观察到,现有的神经方法通常给出的手势的运动位移量不足,这可以表征为缓慢或沉闷。因此,我们提出了一种新的与记忆网络相结合的生成模型,作为动态字典来生成具有改进多样性的手势。在该模型的框架下,字典网络动态存储先前出现的与文本特征对应的姿势特征,供生成器查找,而姿势生成网络接收音频和姿势特征并输出生成的手势序列。在生成过程中利用种子位姿来保证两个语音段之间的连续性。我们还提出了一种新的客观评价指标来生成手势的多样性,并成功地证明了所提出的模型具有生成具有改进多样性的手势的能力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Generating Diverse Gestures from Speech Using Memory Networks as Dynamic Dictionaries
People naturally enhance their speeches with body motion or gestures. Generating human gestures for digital humans or virtual avatars from speech audio or text remains challenging for its indeterministic nature. We observe that existing neural methods often give gestures with an inadequate amount of movement shift, which can be characterized as slow or dull. Thus, we propose a novel generative model coupled with memory networks to work as dynamic dictionaries for generating gestures with improved diversity. Under the hood of the proposed model, a dictionary network dynamically stores previously appeared pose features corresponding to text features for the generator to lookup, while a pose generation network takes in audio and pose features and outputs the resulting gesture sequences. Seed poses are utilized in the generation process to guarantee the continuity between two speech segments. We also propose a new objective evaluation metric for diversity of generated gestures and succeed in demonstrating that the proposed model has the ability to generate gestures with improved diversity.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A Vision Enhancement Network for Image Quality Assessment Analysis and Application of Tourists’ Sentiment Based on Hotel Comment Data Automatic Image Generation of Peking Opera Face using StyleGAN2 Analysis of Emotional Influencing Factors of Online Travel Reviews Based on BiLSTM-CNN Performance comparison of deep learning methods on hand bone segmentation and bone age assessment
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1