Improved decision trees for multi-stream HMM-based audio-visual continuous speech recognition

Jing Huang, Karthik Venkat Ramanan
{"title":"Improved decision trees for multi-stream HMM-based audio-visual continuous speech recognition","authors":"Jing Huang, Karthik Venkat Ramanan","doi":"10.1109/ASRU.2009.5373454","DOIUrl":null,"url":null,"abstract":"HMM-based audio-visual speech recognition (AVSR) systems have shown success in continuous speech recognition by combining visual and audio information, especially in noisy environments. In this paper we study how to improve decision trees used to create context classes in HMM-based AVSR systems. Traditionally, visual models have been trained with the same context classes as the audio only models. In this paper we investigate the use of separate decision trees to model the context classes for the audio and visual streams independently. Additionally we investigate the use of viseme classes in the decision tree building for the visual stream. On experiments with a 37-speaker 1.5 hours test set (about 12000 words) of continuous digits in noise, we obtain about a 3% absolute (20% relative) gain on AVSR performance by using separate decision trees for the audio and visual streams when using viseme classes in decision tree building for the visual stream.","PeriodicalId":292194,"journal":{"name":"2009 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 IEEE Workshop on Automatic Speech Recognition & Understanding","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASRU.2009.5373454","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

HMM-based audio-visual speech recognition (AVSR) systems have shown success in continuous speech recognition by combining visual and audio information, especially in noisy environments. In this paper we study how to improve decision trees used to create context classes in HMM-based AVSR systems. Traditionally, visual models have been trained with the same context classes as the audio only models. In this paper we investigate the use of separate decision trees to model the context classes for the audio and visual streams independently. Additionally we investigate the use of viseme classes in the decision tree building for the visual stream. On experiments with a 37-speaker 1.5 hours test set (about 12000 words) of continuous digits in noise, we obtain about a 3% absolute (20% relative) gain on AVSR performance by using separate decision trees for the audio and visual streams when using viseme classes in decision tree building for the visual stream.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于多流hmm的视听连续语音识别改进决策树
基于hmm的视听语音识别(AVSR)系统通过结合视觉和音频信息,在连续语音识别中取得了成功,特别是在嘈杂环境中。本文研究了如何改进基于hmm的AVSR系统中用于创建上下文类的决策树。传统上,视觉模型与音频模型使用相同的上下文类进行训练。在本文中,我们研究了使用独立的决策树对音频和视觉流的上下文类进行独立建模。此外,我们还研究了viseme类在视觉流决策树构建中的使用。在37个扬声器1.5小时的连续数字噪声测试集(约12000个单词)的实验中,当在视觉流的决策树构建中使用viseme类时,我们通过对音频和视觉流使用单独的决策树,获得了AVSR性能的3%绝对增益(20%相对增益)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Detection of OOV words by combining acoustic confidence measures with linguistic features Automatic translation from parallel speech: Simultaneous interpretation as MT training data Local and global models for spontaneous speech segment detection and characterization Automatic punctuation generation for speech Response timing generation and response type selection for a spontaneous spoken dialog system
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1