基于可变运动帧插值的高度流畅的手语合成

Ni Zeng, Yiqiang Chen, Yang Gu, Dongdong Liu, Yunbing Xing
{"title":"基于可变运动帧插值的高度流畅的手语合成","authors":"Ni Zeng, Yiqiang Chen, Yang Gu, Dongdong Liu, Yunbing Xing","doi":"10.1109/SMC42975.2020.9283193","DOIUrl":null,"url":null,"abstract":"Sign Language Synthesis (SLS) is a domain-specific problem where multiple sign language words are stitched to generate a whole sentence in video, which serves to facilitate communications between the hearing-impaired people and healthy population. This paper presents a Variable Motion Frame Interpolation (VMFI) method for highly fluent SLS in scattered videos. Existing approaches for SLS mainly focus on mechanical virtual human technology, lacking high flexibility and natural effect. Also, the representative solutions to interpolate frames usually assume that the motion object moves at a constant speed which is not suitable for predicting the complex hand motion in frames of scattered sign language videos. To address the above issues, the proposed VMFI adopts acceleration to predict more accurate interpolated frames based on an end-to-end convolutional neural network. The framework of VMFI consists of variable optical flow estimation network and high-quality frame synthesis network that can approximate and fuse the intermediate optical flow to generate interpolated frames for synthesis. Experimental results on our realistic collected Chinese sign language dataset demonstrate that the proposed VMFI model achieves efficiency by performing better in PSNR (Peak Signal to Noise Ratio), SSIM (Structural Similarity) and MA (Motion Activity) and gets higher score in MOS (Mean Opinion Score) than other two representative methods.","PeriodicalId":6718,"journal":{"name":"2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC)","volume":"7 1","pages":"1772-1777"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Highly Fluent Sign Language Synthesis Based on Variable Motion Frame Interpolation\",\"authors\":\"Ni Zeng, Yiqiang Chen, Yang Gu, Dongdong Liu, Yunbing Xing\",\"doi\":\"10.1109/SMC42975.2020.9283193\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Sign Language Synthesis (SLS) is a domain-specific problem where multiple sign language words are stitched to generate a whole sentence in video, which serves to facilitate communications between the hearing-impaired people and healthy population. This paper presents a Variable Motion Frame Interpolation (VMFI) method for highly fluent SLS in scattered videos. Existing approaches for SLS mainly focus on mechanical virtual human technology, lacking high flexibility and natural effect. Also, the representative solutions to interpolate frames usually assume that the motion object moves at a constant speed which is not suitable for predicting the complex hand motion in frames of scattered sign language videos. To address the above issues, the proposed VMFI adopts acceleration to predict more accurate interpolated frames based on an end-to-end convolutional neural network. The framework of VMFI consists of variable optical flow estimation network and high-quality frame synthesis network that can approximate and fuse the intermediate optical flow to generate interpolated frames for synthesis. Experimental results on our realistic collected Chinese sign language dataset demonstrate that the proposed VMFI model achieves efficiency by performing better in PSNR (Peak Signal to Noise Ratio), SSIM (Structural Similarity) and MA (Motion Activity) and gets higher score in MOS (Mean Opinion Score) than other two representative methods.\",\"PeriodicalId\":6718,\"journal\":{\"name\":\"2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC)\",\"volume\":\"7 1\",\"pages\":\"1772-1777\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-10-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SMC42975.2020.9283193\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SMC42975.2020.9283193","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

摘要

手语合成(Sign Language Synthesis, SLS)是将多个手语单词拼接成一个完整的视频句子,以方便听障人群与健康人群之间的交流的领域问题。针对离散视频中高度流畅的SLS,提出了一种可变运动帧插值方法。现有的SLS方法主要集中在机械虚拟人技术上,缺乏高度的灵活性和自然效果。此外,典型的插值帧解通常假设运动对象以恒定速度运动,这不适用于预测分散的手语视频帧中的复杂手部运动。为了解决上述问题,本文提出的VMFI采用基于端到端卷积神经网络的加速来预测更准确的插值帧。VMFI框架由可变光流估计网络和高质量帧合成网络组成,高质量帧合成网络可以近似和融合中间光流,生成插值帧进行合成。在实际收集的中国手语数据集上的实验结果表明,所提出的VMFI模型在峰值信噪比(PSNR)、结构相似度(SSIM)和运动活跃度(MA)方面具有更好的性能,在平均意见得分(MOS)方面取得了比其他两种代表性方法更高的分数。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Highly Fluent Sign Language Synthesis Based on Variable Motion Frame Interpolation
Sign Language Synthesis (SLS) is a domain-specific problem where multiple sign language words are stitched to generate a whole sentence in video, which serves to facilitate communications between the hearing-impaired people and healthy population. This paper presents a Variable Motion Frame Interpolation (VMFI) method for highly fluent SLS in scattered videos. Existing approaches for SLS mainly focus on mechanical virtual human technology, lacking high flexibility and natural effect. Also, the representative solutions to interpolate frames usually assume that the motion object moves at a constant speed which is not suitable for predicting the complex hand motion in frames of scattered sign language videos. To address the above issues, the proposed VMFI adopts acceleration to predict more accurate interpolated frames based on an end-to-end convolutional neural network. The framework of VMFI consists of variable optical flow estimation network and high-quality frame synthesis network that can approximate and fuse the intermediate optical flow to generate interpolated frames for synthesis. Experimental results on our realistic collected Chinese sign language dataset demonstrate that the proposed VMFI model achieves efficiency by performing better in PSNR (Peak Signal to Noise Ratio), SSIM (Structural Similarity) and MA (Motion Activity) and gets higher score in MOS (Mean Opinion Score) than other two representative methods.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
At-the-Edge Data Processing for Low Latency High Throughput Machine Learning Algorithms Machine Learning for First Principles Calculations of Material Properties for Ferromagnetic Materials Mobility Aware Computation Offloading Model for Edge Computing Toward an Autonomous Workflow for Single Crystal Neutron Diffraction Virtual Infrastructure Twins: Software Testing Platforms for Computing-Instrument Ecosystems
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1