汉语普通话唇读的级联序列-序列模型

Ya Zhao, Rui Xu, Mingli Song
{"title":"汉语普通话唇读的级联序列-序列模型","authors":"Ya Zhao, Rui Xu, Mingli Song","doi":"10.1145/3338533.3366579","DOIUrl":null,"url":null,"abstract":"Lip reading aims at decoding texts from the movement of a speaker's mouth. In recent years, lip reading methods have made great progress for English, at both word-level and sentence-level. Unlike English, however, Chinese Mandarin is a tone-based language and relies on pitches to distinguish lexical or grammatical meaning, which significantly increases the ambiguity for the lip reading task. In this paper, we propose a Cascade Sequence-to-Sequence Model for Chinese Mandarin (CSSMCM) lip reading, which explicitly models tones when predicting sentence. Tones are modeled based on visual information and syntactic structure, and are used to predict sentence along with visual information and syntactic structure. In order to evaluate CSSMCM, a dataset called CMLR (Chinese Mandarin Lip Reading) is collected and released, consisting of over 100,000 natural sentences from China Network Television website. When trained on CMLR dataset, the proposed CSSMCM surpasses the performance of state-of-the-art lip reading frameworks, which confirms the effectiveness of explicit modeling of tones for Chinese Mandarin lip reading.","PeriodicalId":273086,"journal":{"name":"Proceedings of the ACM Multimedia Asia","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"33","resultStr":"{\"title\":\"A Cascade Sequence-to-Sequence Model for Chinese Mandarin Lip Reading\",\"authors\":\"Ya Zhao, Rui Xu, Mingli Song\",\"doi\":\"10.1145/3338533.3366579\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Lip reading aims at decoding texts from the movement of a speaker's mouth. In recent years, lip reading methods have made great progress for English, at both word-level and sentence-level. Unlike English, however, Chinese Mandarin is a tone-based language and relies on pitches to distinguish lexical or grammatical meaning, which significantly increases the ambiguity for the lip reading task. In this paper, we propose a Cascade Sequence-to-Sequence Model for Chinese Mandarin (CSSMCM) lip reading, which explicitly models tones when predicting sentence. Tones are modeled based on visual information and syntactic structure, and are used to predict sentence along with visual information and syntactic structure. In order to evaluate CSSMCM, a dataset called CMLR (Chinese Mandarin Lip Reading) is collected and released, consisting of over 100,000 natural sentences from China Network Television website. When trained on CMLR dataset, the proposed CSSMCM surpasses the performance of state-of-the-art lip reading frameworks, which confirms the effectiveness of explicit modeling of tones for Chinese Mandarin lip reading.\",\"PeriodicalId\":273086,\"journal\":{\"name\":\"Proceedings of the ACM Multimedia Asia\",\"volume\":\"20 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-08-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"33\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the ACM Multimedia Asia\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3338533.3366579\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ACM Multimedia Asia","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3338533.3366579","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 33

摘要

唇读的目的是通过说话人的嘴的运动来解读文本。近年来,唇读方法在英语词汇水平和句子水平上都取得了很大的进步。然而,与英语不同的是,汉语普通话是一种以声调为基础的语言,依靠音高来区分词汇或语法意义,这大大增加了唇读任务的模糊性。在本文中,我们提出了一个串级序列到序列的汉语普通话唇读模型(CSSMCM),该模型在预测句子时明确地建模声调。声调是基于视觉信息和句法结构建模的,用于预测句子的视觉信息和句法结构。为了评估CSSMCM,我们收集并发布了一个名为CMLR (Chinese Mandarin Lip Reading)的数据集,该数据集由来自中国网络电视台网站的10万多条自然句子组成。当在CMLR数据集上训练时,所提出的CSSMCM超过了最先进的唇读框架的性能,这证实了显式语调建模对汉语普通话唇读的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
A Cascade Sequence-to-Sequence Model for Chinese Mandarin Lip Reading
Lip reading aims at decoding texts from the movement of a speaker's mouth. In recent years, lip reading methods have made great progress for English, at both word-level and sentence-level. Unlike English, however, Chinese Mandarin is a tone-based language and relies on pitches to distinguish lexical or grammatical meaning, which significantly increases the ambiguity for the lip reading task. In this paper, we propose a Cascade Sequence-to-Sequence Model for Chinese Mandarin (CSSMCM) lip reading, which explicitly models tones when predicting sentence. Tones are modeled based on visual information and syntactic structure, and are used to predict sentence along with visual information and syntactic structure. In order to evaluate CSSMCM, a dataset called CMLR (Chinese Mandarin Lip Reading) is collected and released, consisting of over 100,000 natural sentences from China Network Television website. When trained on CMLR dataset, the proposed CSSMCM surpasses the performance of state-of-the-art lip reading frameworks, which confirms the effectiveness of explicit modeling of tones for Chinese Mandarin lip reading.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Session details: Vision in Multimedia Domain Specific and Idiom Adaptive Video Summarization Multi-Label Image Classification with Attention Mechanism and Graph Convolutional Networks Session details: Brave New Idea Self-balance Motion and Appearance Model for Multi-object Tracking in UAV
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1