{"title":"Voice search language model adaptation using contextual information","authors":"Justin Scheiner, Ian Williams, Petar S. Aleksic","doi":"10.1109/SLT.2016.7846273","DOIUrl":null,"url":null,"abstract":"It has been shown that automatic speech recognition (ASR) system quality can be improved by augmenting n-gram language models with contextual information [1][2]. In the voice search domain, there are a large number of useful contextual signals for a given query. Some of these signals are speaker location, speaker identity, time of the query, etc. Each of these signals comes with relevant contextual information (e.g. location specific entities, favorite queries, recent popular queries) that is not included in the language model's training data. We show that these contextual signals can be used to improve ASR system quality. This is achieved by adjusting n-gram language model probabilities on-the-fly based on the contextual information relevant for the current voice search request. We analyze three example sources of context: location context, previously entered typed and spoken queries. We present a set of approaches we have used to improve ASR quality using these sources of context. Our main objective is to automatically, in real time, take advantage of all available sources of contextual information. In addition, we investigate challenges that come with applying our approach to a number of languages (unsegmented languages, languages with diacritics) and present solutions used.","PeriodicalId":281635,"journal":{"name":"2016 IEEE Spoken Language Technology Workshop (SLT)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE Spoken Language Technology Workshop (SLT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SLT.2016.7846273","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 15
Abstract
It has been shown that automatic speech recognition (ASR) system quality can be improved by augmenting n-gram language models with contextual information [1][2]. In the voice search domain, there are a large number of useful contextual signals for a given query. Some of these signals are speaker location, speaker identity, time of the query, etc. Each of these signals comes with relevant contextual information (e.g. location specific entities, favorite queries, recent popular queries) that is not included in the language model's training data. We show that these contextual signals can be used to improve ASR system quality. This is achieved by adjusting n-gram language model probabilities on-the-fly based on the contextual information relevant for the current voice search request. We analyze three example sources of context: location context, previously entered typed and spoken queries. We present a set of approaches we have used to improve ASR quality using these sources of context. Our main objective is to automatically, in real time, take advantage of all available sources of contextual information. In addition, we investigate challenges that come with applying our approach to a number of languages (unsegmented languages, languages with diacritics) and present solutions used.