Lattice Rescoring Based on Large Ensemble of Complementary Neural Language Models

ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2022-05-23 DOI:10.1109/ICASSP43922.2022.9747745

A. Ogawa, Naohiro Tawara, Marc Delcroix, S. Araki

{"title":"Lattice Rescoring Based on Large Ensemble of Complementary Neural Language Models","authors":"A. Ogawa, Naohiro Tawara, Marc Delcroix, S. Araki","doi":"10.1109/ICASSP43922.2022.9747745","DOIUrl":null,"url":null,"abstract":"We investigate the effectiveness of using a large ensemble of advanced neural language models (NLMs) for lattice rescoring on automatic speech recognition (ASR) hypotheses. Previous studies have reported the effectiveness of combining a small number of NLMs. In contrast, in this study, we combine up to eight NLMs, i.e., forward/backward long short-term memory/Transformer-LMs that are trained with two different random initialization seeds. We combine these NLMs through iterative lattice generation. Since these NLMs work complementarily with each other, by combining them one by one at each rescoring iteration, language scores attached to given lattice arcs can be gradually refined. Consequently, errors of the ASR hypotheses can be gradually reduced. We also investigate the effectiveness of carrying over contextual information (previous rescoring results) across a lattice sequence of a long speech such as a lecture speech. In experiments using a lecture speech corpus, by combining the eight NLMs and using context carry-over, we obtained a 24.4% relative word error rate reduction from the ASR 1-best baseline. For further comparison, we performed simultaneous (i.e., non-iterative) NLM combination and 100-best rescoring using the large ensemble of NLMs, which confirmed the advantage of lattice rescoring with iterative NLM combination.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"68 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICASSP43922.2022.9747745","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

We investigate the effectiveness of using a large ensemble of advanced neural language models (NLMs) for lattice rescoring on automatic speech recognition (ASR) hypotheses. Previous studies have reported the effectiveness of combining a small number of NLMs. In contrast, in this study, we combine up to eight NLMs, i.e., forward/backward long short-term memory/Transformer-LMs that are trained with two different random initialization seeds. We combine these NLMs through iterative lattice generation. Since these NLMs work complementarily with each other, by combining them one by one at each rescoring iteration, language scores attached to given lattice arcs can be gradually refined. Consequently, errors of the ASR hypotheses can be gradually reduced. We also investigate the effectiveness of carrying over contextual information (previous rescoring results) across a lattice sequence of a long speech such as a lecture speech. In experiments using a lecture speech corpus, by combining the eight NLMs and using context carry-over, we obtained a 24.4% relative word error rate reduction from the ASR 1-best baseline. For further comparison, we performed simultaneous (i.e., non-iterative) NLM combination and 100-best rescoring using the large ensemble of NLMs, which confirmed the advantage of lattice rescoring with iterative NLM combination.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于互补神经语言模型大集合的点阵评分

我们研究了在自动语音识别(ASR)假设上使用大量高级神经语言模型(nlm)进行点阵评分的有效性。先前的研究已经报道了联合使用少量nlm的有效性。相比之下，在本研究中，我们结合了多达8个nlm，即前向/后向长短期记忆/转换lms，它们由两种不同的随机初始化种子训练。我们通过迭代晶格生成来组合这些nlm。由于这些nlm相互补充，通过在每次评分迭代中将它们一个接一个地组合在一起，可以逐渐改进给定点阵弧的语言分数。因此，ASR假设的误差可以逐渐减小。我们还研究了在长演讲(如演讲)的晶格序列中传递上下文信息(先前的评分结果)的有效性。在使用演讲语料库的实验中，通过结合8个nlm并使用上下文结转，我们获得了相对于ASR 1最佳基线的24.4%的相对单词错误率降低。为了进一步比较，我们使用NLM的大集合进行了同时(即非迭代)NLM组合和100-best评分，这证实了迭代NLM组合的点阵评分的优势。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

自引率

0.00%

发文量