Amoolya G, Arnold Sachith A Hans, V. R. Lakkavalli, Senthil Kumar Swami Durai
{"title":"基于Gmm-Hmm和DNN-HMM技术的土鲁语语音自动识别","authors":"Amoolya G, Arnold Sachith A Hans, V. R. Lakkavalli, Senthil Kumar Swami Durai","doi":"10.1109/ICACTA54488.2022.9753319","DOIUrl":null,"url":null,"abstract":"In this work a first Automatic Speech Recognition (ASR) for Tulu language is developed. Seven hours of speech database for Tulu is recorded from native speakers in natural conditions for read speech. Kaldi toolkit is employed to develop GMM-HMM and DNN-HMM based ASR systems. Different speech units are employed to build the system and a detailed set of experiments is carried out on the collected dataset. It was observed that because of the lesser data, monophone GMM-HMM models provide better word error rate (WER) when compared to triphone models. More data is required for the system to reach to better performance with triphones.","PeriodicalId":345370,"journal":{"name":"2022 International Conference on Advanced Computing Technologies and Applications (ICACTA)","volume":"106 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Automatic Speech Recognition for Tulu Language Using Gmm-Hmm and DNN-HMM Techniques\",\"authors\":\"Amoolya G, Arnold Sachith A Hans, V. R. Lakkavalli, Senthil Kumar Swami Durai\",\"doi\":\"10.1109/ICACTA54488.2022.9753319\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this work a first Automatic Speech Recognition (ASR) for Tulu language is developed. Seven hours of speech database for Tulu is recorded from native speakers in natural conditions for read speech. Kaldi toolkit is employed to develop GMM-HMM and DNN-HMM based ASR systems. Different speech units are employed to build the system and a detailed set of experiments is carried out on the collected dataset. It was observed that because of the lesser data, monophone GMM-HMM models provide better word error rate (WER) when compared to triphone models. More data is required for the system to reach to better performance with triphones.\",\"PeriodicalId\":345370,\"journal\":{\"name\":\"2022 International Conference on Advanced Computing Technologies and Applications (ICACTA)\",\"volume\":\"106 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-03-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 International Conference on Advanced Computing Technologies and Applications (ICACTA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICACTA54488.2022.9753319\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Conference on Advanced Computing Technologies and Applications (ICACTA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICACTA54488.2022.9753319","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Automatic Speech Recognition for Tulu Language Using Gmm-Hmm and DNN-HMM Techniques
In this work a first Automatic Speech Recognition (ASR) for Tulu language is developed. Seven hours of speech database for Tulu is recorded from native speakers in natural conditions for read speech. Kaldi toolkit is employed to develop GMM-HMM and DNN-HMM based ASR systems. Different speech units are employed to build the system and a detailed set of experiments is carried out on the collected dataset. It was observed that because of the lesser data, monophone GMM-HMM models provide better word error rate (WER) when compared to triphone models. More data is required for the system to reach to better performance with triphones.