Nalin Srun, Sotheara Leang, Ye Kyaw Thu, Sethserey Sam
{"title":"高棉语自动语音识别的卷积时延神经网络","authors":"Nalin Srun, Sotheara Leang, Ye Kyaw Thu, Sethserey Sam","doi":"10.1109/iSAI-NLP56921.2022.9960286","DOIUrl":null,"url":null,"abstract":"Convolutional Neural Networks have been proven to successfully capture spatial aspects of the speech signal and eliminate spectral variations across speakers for Automatic Speech Recognition. In this study, we investigate the Convolutional Neural Net-work with Time Delay Neural Network for an acoustic model to deal with large vocabulary continuous speech recognition for Khmer. Our idea is to use Convolutional Neural Networks to extract local features of the speech signal, whereas Time Delay Neural Networks capture long temporal correlations between acoustic events. The experimental results show that the suggested net-work outperforms the Time Delay Neural Network and achieves an average relative improvement of 14% across test sets.","PeriodicalId":399019,"journal":{"name":"2022 17th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP)","volume":"119 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Convolutional Time Delay Neural Network for Khmer Automatic Speech Recognition\",\"authors\":\"Nalin Srun, Sotheara Leang, Ye Kyaw Thu, Sethserey Sam\",\"doi\":\"10.1109/iSAI-NLP56921.2022.9960286\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Convolutional Neural Networks have been proven to successfully capture spatial aspects of the speech signal and eliminate spectral variations across speakers for Automatic Speech Recognition. In this study, we investigate the Convolutional Neural Net-work with Time Delay Neural Network for an acoustic model to deal with large vocabulary continuous speech recognition for Khmer. Our idea is to use Convolutional Neural Networks to extract local features of the speech signal, whereas Time Delay Neural Networks capture long temporal correlations between acoustic events. The experimental results show that the suggested net-work outperforms the Time Delay Neural Network and achieves an average relative improvement of 14% across test sets.\",\"PeriodicalId\":399019,\"journal\":{\"name\":\"2022 17th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP)\",\"volume\":\"119 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-11-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 17th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/iSAI-NLP56921.2022.9960286\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 17th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/iSAI-NLP56921.2022.9960286","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Convolutional Time Delay Neural Network for Khmer Automatic Speech Recognition
Convolutional Neural Networks have been proven to successfully capture spatial aspects of the speech signal and eliminate spectral variations across speakers for Automatic Speech Recognition. In this study, we investigate the Convolutional Neural Net-work with Time Delay Neural Network for an acoustic model to deal with large vocabulary continuous speech recognition for Khmer. Our idea is to use Convolutional Neural Networks to extract local features of the speech signal, whereas Time Delay Neural Networks capture long temporal correlations between acoustic events. The experimental results show that the suggested net-work outperforms the Time Delay Neural Network and achieves an average relative improvement of 14% across test sets.