Timo Mertens, Daniel Schneider, A. Næss, T. Svendsen
{"title":"子词语音识别的词汇自适应","authors":"Timo Mertens, Daniel Schneider, A. Næss, T. Svendsen","doi":"10.1109/ASRU.2009.5373296","DOIUrl":null,"url":null,"abstract":"In this paper we present two approaches to adapt a syllable-based recognition lexicon in an automatic speech recognition (ASR) setting. The motivation is to evaluate whether adaptation techniques commonly used on a word level can also be employed on a subword level. The first method predicts syllable variations, taking into account sub-syllabic phone cluster variations, and subsequently adapts the syllable lexicon. The second approach adds syllable bigrams to the lexicon to cope with acoustic confusability of subword units and syllable-inherent phone attachment ambiguities. We evaluate the methods on two German data sets, one consisting of planned and the other of spontaneous speech. Although the first method did not yield any improvement in the syllable error rate (SER), we could observe that the predicted confusions correlate with those observed in the test data. Bigram adaptation improved the SER by 1.3% and 0.8% absolute on the planned and spontaneous data sets, respectively.","PeriodicalId":292194,"journal":{"name":"2009 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Lexicon adaptation for subword speech recognition\",\"authors\":\"Timo Mertens, Daniel Schneider, A. Næss, T. Svendsen\",\"doi\":\"10.1109/ASRU.2009.5373296\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper we present two approaches to adapt a syllable-based recognition lexicon in an automatic speech recognition (ASR) setting. The motivation is to evaluate whether adaptation techniques commonly used on a word level can also be employed on a subword level. The first method predicts syllable variations, taking into account sub-syllabic phone cluster variations, and subsequently adapts the syllable lexicon. The second approach adds syllable bigrams to the lexicon to cope with acoustic confusability of subword units and syllable-inherent phone attachment ambiguities. We evaluate the methods on two German data sets, one consisting of planned and the other of spontaneous speech. Although the first method did not yield any improvement in the syllable error rate (SER), we could observe that the predicted confusions correlate with those observed in the test data. Bigram adaptation improved the SER by 1.3% and 0.8% absolute on the planned and spontaneous data sets, respectively.\",\"PeriodicalId\":292194,\"journal\":{\"name\":\"2009 IEEE Workshop on Automatic Speech Recognition & Understanding\",\"volume\":\"7 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2009-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2009 IEEE Workshop on Automatic Speech Recognition & Understanding\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ASRU.2009.5373296\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 IEEE Workshop on Automatic Speech Recognition & Understanding","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASRU.2009.5373296","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
In this paper we present two approaches to adapt a syllable-based recognition lexicon in an automatic speech recognition (ASR) setting. The motivation is to evaluate whether adaptation techniques commonly used on a word level can also be employed on a subword level. The first method predicts syllable variations, taking into account sub-syllabic phone cluster variations, and subsequently adapts the syllable lexicon. The second approach adds syllable bigrams to the lexicon to cope with acoustic confusability of subword units and syllable-inherent phone attachment ambiguities. We evaluate the methods on two German data sets, one consisting of planned and the other of spontaneous speech. Although the first method did not yield any improvement in the syllable error rate (SER), we could observe that the predicted confusions correlate with those observed in the test data. Bigram adaptation improved the SER by 1.3% and 0.8% absolute on the planned and spontaneous data sets, respectively.