Improved decision trees for multi-stream HMM-based audio-visual continuous speech recognition

2009 IEEE Workshop on Automatic Speech Recognition & Understanding Pub Date : 2009-12-01 DOI:10.1109/ASRU.2009.5373454

Jing Huang, Karthik Venkat Ramanan

引用次数: 3

Abstract

HMM-based audio-visual speech recognition (AVSR) systems have shown success in continuous speech recognition by combining visual and audio information, especially in noisy environments. In this paper we study how to improve decision trees used to create context classes in HMM-based AVSR systems. Traditionally, visual models have been trained with the same context classes as the audio only models. In this paper we investigate the use of separate decision trees to model the context classes for the audio and visual streams independently. Additionally we investigate the use of viseme classes in the decision tree building for the visual stream. On experiments with a 37-speaker 1.5 hours test set (about 12000 words) of continuous digits in noise, we obtain about a 3% absolute (20% relative) gain on AVSR performance by using separate decision trees for the audio and visual streams when using viseme classes in decision tree building for the visual stream.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于多流hmm的视听连续语音识别改进决策树

基于hmm的视听语音识别(AVSR)系统通过结合视觉和音频信息，在连续语音识别中取得了成功，特别是在嘈杂环境中。本文研究了如何改进基于hmm的AVSR系统中用于创建上下文类的决策树。传统上，视觉模型与音频模型使用相同的上下文类进行训练。在本文中，我们研究了使用独立的决策树对音频和视觉流的上下文类进行独立建模。此外，我们还研究了viseme类在视觉流决策树构建中的使用。在37个扬声器1.5小时的连续数字噪声测试集(约12000个单词)的实验中，当在视觉流的决策树构建中使用viseme类时，我们通过对音频和视觉流使用单独的决策树，获得了AVSR性能的3%绝对增益(20%相对增益)。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2009 IEEE Workshop on Automatic Speech Recognition & Understanding

自引率

0.00%

发文量