{"title":"IPA:改进的手机模型与循环神经网络","authors":"T. Robinson, M. Hochberg, S. Renals","doi":"10.1109/ICASSP.1994.389361","DOIUrl":null,"url":null,"abstract":"This paper describes phone modelling improvements to the hybrid connectionist-hidden Markov model speech recognition system developed at Cambridge University. These improvements are applied to phone recognition from the TIMIT task and word recognition from the Wall Street Journal (WSJ) task. A recurrent net is used to map acoustic vectors to posterior probabilities of phone classes. The maximum likelihood phone or word string is then extracted using Markov models. The paper describes three improvements: connectionist model merging; explicit presentation of acoustic context; and improved duration modelling. The first is shown to provide a significant improvement in the TIMIT phone recognition rate and all three provide an improvement in the WSJ word recognition rate.<<ETX>>","PeriodicalId":290798,"journal":{"name":"Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"51 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1994-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"46","resultStr":"{\"title\":\"IPA: improved phone modelling with recurrent neural networks\",\"authors\":\"T. Robinson, M. Hochberg, S. Renals\",\"doi\":\"10.1109/ICASSP.1994.389361\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper describes phone modelling improvements to the hybrid connectionist-hidden Markov model speech recognition system developed at Cambridge University. These improvements are applied to phone recognition from the TIMIT task and word recognition from the Wall Street Journal (WSJ) task. A recurrent net is used to map acoustic vectors to posterior probabilities of phone classes. The maximum likelihood phone or word string is then extracted using Markov models. The paper describes three improvements: connectionist model merging; explicit presentation of acoustic context; and improved duration modelling. The first is shown to provide a significant improvement in the TIMIT phone recognition rate and all three provide an improvement in the WSJ word recognition rate.<<ETX>>\",\"PeriodicalId\":290798,\"journal\":{\"name\":\"Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing\",\"volume\":\"51 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1994-04-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"46\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICASSP.1994.389361\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICASSP.1994.389361","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
IPA: improved phone modelling with recurrent neural networks
This paper describes phone modelling improvements to the hybrid connectionist-hidden Markov model speech recognition system developed at Cambridge University. These improvements are applied to phone recognition from the TIMIT task and word recognition from the Wall Street Journal (WSJ) task. A recurrent net is used to map acoustic vectors to posterior probabilities of phone classes. The maximum likelihood phone or word string is then extracted using Markov models. The paper describes three improvements: connectionist model merging; explicit presentation of acoustic context; and improved duration modelling. The first is shown to provide a significant improvement in the TIMIT phone recognition rate and all three provide an improvement in the WSJ word recognition rate.<>