Dynamic Combination of Automatic Speech Recognition Systems by Driven Decoding

IEEE Transactions on Audio Speech and Language Processing Pub Date : 2013-06-01 DOI:10.1109/TASL.2013.2248716

B. Lecouteux, G. Linarès, Y. Estève, G. Gravier

{"title":"Dynamic Combination of Automatic Speech Recognition Systems by Driven Decoding","authors":"B. Lecouteux, G. Linarès, Y. Estève, G. Gravier","doi":"10.1109/TASL.2013.2248716","DOIUrl":null,"url":null,"abstract":"Combining automatic speech recognition (ASR) systems generally relies on the posterior merging of the outputs or on acoustic cross-adaptation. In this paper, we propose an integrated approach where outputs of secondary systems are integrated in the search algorithm of a primary one. In this driven decoding algorithm (DDA), the secondary systems are viewed as observation sources that should be evaluated and combined to others by a primary search algorithm. DDA is evaluated on a subset of the ESTER I corpus consisting of 4 hours of French radio broadcast news. Results demonstrate DDA significantly outperforms vote-based approaches: we obtain an improvement of 14.5% relative word error rate over the best single-systems, as opposed to the the 6.7% with a ROVER combination. An in-depth analysis of the DDA shows its ability to improve robustness (gains are greater in adverse conditions) and a relatively low dependency on the search algorithm. The application of DDA to both and beam-search-based decoder yields similar performances.","PeriodicalId":55014,"journal":{"name":"IEEE Transactions on Audio Speech and Language Processing","volume":"21 1","pages":"1251-1260"},"PeriodicalIF":0.0000,"publicationDate":"2013-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TASL.2013.2248716","citationCount":"17","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Audio Speech and Language Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TASL.2013.2248716","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 17

Abstract

Combining automatic speech recognition (ASR) systems generally relies on the posterior merging of the outputs or on acoustic cross-adaptation. In this paper, we propose an integrated approach where outputs of secondary systems are integrated in the search algorithm of a primary one. In this driven decoding algorithm (DDA), the secondary systems are viewed as observation sources that should be evaluated and combined to others by a primary search algorithm. DDA is evaluated on a subset of the ESTER I corpus consisting of 4 hours of French radio broadcast news. Results demonstrate DDA significantly outperforms vote-based approaches: we obtain an improvement of 14.5% relative word error rate over the best single-systems, as opposed to the the 6.7% with a ROVER combination. An in-depth analysis of the DDA shows its ability to improve robustness (gains are greater in adverse conditions) and a relatively low dependency on the search algorithm. The application of DDA to both and beam-search-based decoder yields similar performances.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于驱动解码的语音自动识别系统动态组合

组合自动语音识别(ASR)系统通常依赖于输出的后验合并或声学交叉适应。在本文中，我们提出了一个集成的方法，其中二次系统的输出集成在初级系统的搜索算法。在这种驱动解码算法(DDA)中，次要系统被视为观测源，应该通过主要搜索算法对其进行评估和组合。DDA在ESTER I语料库的一个子集上进行评估，该语料库由4小时的法语广播新闻组成。结果表明，DDA显著优于基于投票的方法:与最佳的单一系统相比，我们获得了14.5%的相对单词错误率提高，而与ROVER组合相比，错误率提高了6.7%。对DDA的深入分析表明，它能够提高鲁棒性(在不利条件下收益更大)，并且对搜索算法的依赖性相对较低。将DDA应用于基于波束搜索的解码器和基于波束搜索的解码器可以获得相似的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Transactions on Audio Speech and Language Processing 工程技术-工程：电子与电气

自引率

0.00%

发文量

审稿时长

24.0 months

期刊介绍： The IEEE Transactions on Audio, Speech and Language Processing covers the sciences, technologies and applications relating to the analysis, coding, enhancement, recognition and synthesis of audio, music, speech and language. In particular, audio processing also covers auditory modeling, acoustic modeling and source separation. Speech processing also covers speech production and perception, adaptation, lexical modeling and speaker recognition. Language processing also covers spoken language understanding, translation, summarization, mining, general language modeling, as well as spoken dialog systems.