Semantically Meaningful Metrics for Norwegian ASR Systems

Interspeech Pub Date : 2022-09-18 DOI:10.21437/interspeech.2022-817

J. Rugayan, T. Svendsen, G. Salvi

{"title":"Semantically Meaningful Metrics for Norwegian ASR Systems","authors":"J. Rugayan, T. Svendsen, G. Salvi","doi":"10.21437/interspeech.2022-817","DOIUrl":null,"url":null,"abstract":"Evaluation metrics are important for quanitfying the perfor- mance of Automatic Speech Recognition (ASR) systems. How-ever, the widely used word error rate (WER) captures errors at the word-level only and weighs each error equally, which makes it insufﬁcient to discern ASR system performance for down- stream tasks such as Natural Language Understanding (NLU) or information retrieval. We explore in this paper a more ro- bust and discriminative evaluation metric for Norwegian ASR systems through the use of semantic information modeled by a transformer-based language model. We propose Aligned Semantic Distance (ASD) which employs dynamic programming to quantify the similarity between the reference and hypothesis text. First, embedding vectors are generated using the Nor- BERT model. Afterwards, the minimum global distance of the optimal alignment between these vectors is obtained and nor- malized by the sequence length of the reference embedding vec-tor. In addition, we present results using Semantic Distance (SemDist), and compare them with ASD. Results show that for the same WER, ASD and SemDist values can vary signiﬁcantly, thus, exemplifying that not all recognition errors can be consid-ered equally important. We investigate the resulting data, and present examples which demonstrate the nuances of both metrics in evaluating various transcription errors.","PeriodicalId":73500,"journal":{"name":"Interspeech","volume":"1 1","pages":"2283-2287"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Interspeech","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21437/interspeech.2022-817","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Evaluation metrics are important for quanitfying the perfor- mance of Automatic Speech Recognition (ASR) systems. How-ever, the widely used word error rate (WER) captures errors at the word-level only and weighs each error equally, which makes it insufﬁcient to discern ASR system performance for down- stream tasks such as Natural Language Understanding (NLU) or information retrieval. We explore in this paper a more ro- bust and discriminative evaluation metric for Norwegian ASR systems through the use of semantic information modeled by a transformer-based language model. We propose Aligned Semantic Distance (ASD) which employs dynamic programming to quantify the similarity between the reference and hypothesis text. First, embedding vectors are generated using the Nor- BERT model. Afterwards, the minimum global distance of the optimal alignment between these vectors is obtained and nor- malized by the sequence length of the reference embedding vec-tor. In addition, we present results using Semantic Distance (SemDist), and compare them with ASD. Results show that for the same WER, ASD and SemDist values can vary signiﬁcantly, thus, exemplifying that not all recognition errors can be consid-ered equally important. We investigate the resulting data, and present examples which demonstrate the nuances of both metrics in evaluating various transcription errors.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

挪威ASR系统的语义意义度量

评价指标是评价自动语音识别系统性能的重要指标。然而，广泛使用的单词错误率(WER)仅捕获单词级别的错误，并对每个错误进行平均加权，这使得它不足以区分ASR系统在下游任务(如自然语言理解(NLU)或信息检索)中的性能。在本文中，我们通过使用基于转换器的语言模型建模的语义信息，为挪威ASR系统探索了一个更具活力和判别性的评估指标。我们提出了对齐语义距离(ASD)，它采用动态规划来量化参考文本和假设文本之间的相似度。首先，利用Nor- BERT模型生成嵌入向量。然后，得到这些向量之间最优对齐的最小全局距离，并且不被参考嵌入向量的序列长度化。此外，我们提出了使用语义距离(SemDist)的结果，并将其与ASD进行比较。结果表明，对于相同的WER, ASD和SemDist值可能会有显着差异，因此，说明并非所有识别错误都可以被视为同等重要。我们调查了结果数据，并提出了一些例子，证明了在评估各种转录错误时这两个指标的细微差别。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Interspeech

自引率

0.00%

发文量