Multi-stream Asynchrony Modeling for Audio-Visual Speech Recognition

Guoyun Lv, D. Jiang, R. Zhao, Yunshu Hou
{"title":"Multi-stream Asynchrony Modeling for Audio-Visual Speech Recognition","authors":"Guoyun Lv, D. Jiang, R. Zhao, Yunshu Hou","doi":"10.1109/ISM.2007.21","DOIUrl":null,"url":null,"abstract":"In this paper, two multi-stream asynchrony Dynamic Bayesian Network models (MS-ADBN model and MM-ADBN model) are proposed for audio-visual speech recognition (AVSR). The proposed models, with different topology structures, loose the asynchrony of audio and visual streams to word level. For MS-ADBN model, both in audio stream and in visual stream, each word is composed of its corresponding phones, and each phone is associated with observation vector. MM- ADBN model is an augmentation of MS-ADBN model, a level of hidden nodes--state level, is added between the phone level and the observation node level, to describe the dynamic process of phones. Essentially, MS-ADBN model is a word model, while MM-ADBN model is a phone model. Speech recognition experiments are done on a digit audio-visual (A-V) database, as well as on a continuous A-V database. The results demonstrate that the asynchrony description between audio and visual stream is important for AVSR system, and MM-ADBN model has the best performance for the task of continuous A-V speech recognition.","PeriodicalId":129680,"journal":{"name":"Ninth IEEE International Symposium on Multimedia (ISM 2007)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ninth IEEE International Symposium on Multimedia (ISM 2007)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISM.2007.21","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 11

Abstract

In this paper, two multi-stream asynchrony Dynamic Bayesian Network models (MS-ADBN model and MM-ADBN model) are proposed for audio-visual speech recognition (AVSR). The proposed models, with different topology structures, loose the asynchrony of audio and visual streams to word level. For MS-ADBN model, both in audio stream and in visual stream, each word is composed of its corresponding phones, and each phone is associated with observation vector. MM- ADBN model is an augmentation of MS-ADBN model, a level of hidden nodes--state level, is added between the phone level and the observation node level, to describe the dynamic process of phones. Essentially, MS-ADBN model is a word model, while MM-ADBN model is a phone model. Speech recognition experiments are done on a digit audio-visual (A-V) database, as well as on a continuous A-V database. The results demonstrate that the asynchrony description between audio and visual stream is important for AVSR system, and MM-ADBN model has the best performance for the task of continuous A-V speech recognition.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
视听语音识别的多流异步建模
本文提出了用于视听语音识别的两种多流异步动态贝叶斯网络模型(MS-ADBN模型和MM-ADBN模型)。所提出的模型具有不同的拓扑结构,将音视频流的异步性降低到字级。对于MS-ADBN模型,无论是在音频流还是在视觉流中,每个单词都由其对应的电话组成,每个电话与观测向量相关联。MM- ADBN模型是对MS-ADBN模型的增强,在手机级和观测节点级之间增加了一个隐藏节点级——状态级,用以描述手机的动态过程。MS-ADBN模型本质上是一个词模型,MM-ADBN模型本质上是一个电话模型。语音识别实验分别在数字视听数据库和连续视听数据库上进行。结果表明,音频和视频流之间的异步描述对AVSR系统至关重要,MM-ADBN模型对于连续的A-V语音识别任务具有最佳的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A New Image Compression Scheme Based on Locally Adaptive Coding The Design of a Multi-party VoIP Conferencing System over the Internet Analysis of a New Ubiquitous Multimodal Multimedia Computing System Summarization of Wearable Videos Based on User Activity Analysis Local Binary Patterns for Human Detection on Hexagonal Structure
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1