{"title":"Semantically Meaningful Metrics for Norwegian ASR Systems","authors":"J. Rugayan, T. Svendsen, G. Salvi","doi":"10.21437/interspeech.2022-817","DOIUrl":null,"url":null,"abstract":"Evaluation metrics are important for quanitfying the perfor- mance of Automatic Speech Recognition (ASR) systems. How-ever, the widely used word error rate (WER) captures errors at the word-level only and weighs each error equally, which makes it insufficient to discern ASR system performance for down- stream tasks such as Natural Language Understanding (NLU) or information retrieval. We explore in this paper a more ro- bust and discriminative evaluation metric for Norwegian ASR systems through the use of semantic information modeled by a transformer-based language model. We propose Aligned Semantic Distance (ASD) which employs dynamic programming to quantify the similarity between the reference and hypothesis text. First, embedding vectors are generated using the Nor- BERT model. Afterwards, the minimum global distance of the optimal alignment between these vectors is obtained and nor- malized by the sequence length of the reference embedding vec-tor. In addition, we present results using Semantic Distance (SemDist), and compare them with ASD. Results show that for the same WER, ASD and SemDist values can vary significantly, thus, exemplifying that not all recognition errors can be consid-ered equally important. We investigate the resulting data, and present examples which demonstrate the nuances of both metrics in evaluating various transcription errors.","PeriodicalId":73500,"journal":{"name":"Interspeech","volume":"1 1","pages":"2283-2287"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Interspeech","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21437/interspeech.2022-817","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Evaluation metrics are important for quanitfying the perfor- mance of Automatic Speech Recognition (ASR) systems. How-ever, the widely used word error rate (WER) captures errors at the word-level only and weighs each error equally, which makes it insufficient to discern ASR system performance for down- stream tasks such as Natural Language Understanding (NLU) or information retrieval. We explore in this paper a more ro- bust and discriminative evaluation metric for Norwegian ASR systems through the use of semantic information modeled by a transformer-based language model. We propose Aligned Semantic Distance (ASD) which employs dynamic programming to quantify the similarity between the reference and hypothesis text. First, embedding vectors are generated using the Nor- BERT model. Afterwards, the minimum global distance of the optimal alignment between these vectors is obtained and nor- malized by the sequence length of the reference embedding vec-tor. In addition, we present results using Semantic Distance (SemDist), and compare them with ASD. Results show that for the same WER, ASD and SemDist values can vary significantly, thus, exemplifying that not all recognition errors can be consid-ered equally important. We investigate the resulting data, and present examples which demonstrate the nuances of both metrics in evaluating various transcription errors.