{"title":"Large-margin feature adaptation for automatic speech recognition","authors":"Chih-Chieh Cheng, Fei Sha, L. Saul","doi":"10.1109/ASRU.2009.5373320","DOIUrl":null,"url":null,"abstract":"We consider how to optimize the acoustic features used by hidden Markov models (HMMs) for automatic speech recognition (ASR). We investigate a mistake-driven algorithm that discriminatively reweights the acoustic features in order to separate the log-likelihoods of correct and incorrect transcriptions by a large margin. The algorithm simultaneously optimizes the HMM parameters in the back end by adapting them to the reweighted features computed by the front end. Using an online approach, we incrementally update feature weights and model parameters after the decoding of each training utterance. To mitigate the strongly biased gradients from individual training utterances, we train several different recognizers in parallel while tying the feature transformations in their front ends. We show that this parameter-tying across different recognizers leads to more stable updates and generally fewer recognition errors.","PeriodicalId":292194,"journal":{"name":"2009 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 IEEE Workshop on Automatic Speech Recognition & Understanding","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASRU.2009.5373320","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
We consider how to optimize the acoustic features used by hidden Markov models (HMMs) for automatic speech recognition (ASR). We investigate a mistake-driven algorithm that discriminatively reweights the acoustic features in order to separate the log-likelihoods of correct and incorrect transcriptions by a large margin. The algorithm simultaneously optimizes the HMM parameters in the back end by adapting them to the reweighted features computed by the front end. Using an online approach, we incrementally update feature weights and model parameters after the decoding of each training utterance. To mitigate the strongly biased gradients from individual training utterances, we train several different recognizers in parallel while tying the feature transformations in their front ends. We show that this parameter-tying across different recognizers leads to more stable updates and generally fewer recognition errors.